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Preface 


'To the Tenth Edition 


In this edition every effort has been made to enhance further 
the usefulness of the book by adding a new chapter on Statistical 
Decision Theory and latest 1977 examination questions of yérious 
universities, revising thoroughly the Indian Statistics portion and 
correcting almost all the mistakes. All answers to the Revisionary 
Exercises have been checked again with the help of calculator and 
it is hoped that there are no mistakes. 


Several suggestions from the readers were received for the 
improvement of the book. Iam grateful to them for the pains 
taken. In particular, I am deeply indebted to my friend Mr. 
Davendar Chatry, Lecturer, Dept. of Statistics, Tribhuvan 
University, Kathmandu, who read every page of the manuscript 
very carefully and provided very useful suggestions. Mr. Prem Raj 
Pant of the Institute of Business Administration, Tribhuvan 
University, Mr. N.C. Gupta of the Ministry of Agriculture and 
Miss Madhu Kapoor, Lecturer of Lady Shri Ram College have 
also helped a lot with their valuable suggestions. 


Since there is always scope for improvement, I shall look 
forward to and gratefully acknowledge all letters received pointing 
out necessary corrections and changes. 

S.P. GUPTA 


To the Ninth Edition 

In order to cater better to the needs of our readers, an 
attempt has been made in this edition to thoroughly improve 
the theory portion, to minimise printing mistakes, to include latest 
examination questions of different universities and to update the 
Indian Statistics portion. 

Letters were received from our readers of this text, either 
making some suggestions, pointing out some printing mistakes or 
seeking some clarifications. I am thankful to them all for the 
trouble they took in writing to me. In particular, mention may 
be made of Mr. R. S. Rastogi, Head of the Deptt. of Economics, 
Hans Raj College, Mr. Surendra Pradhan, Chairman, Business 
Administration and Commerce Degree Programme, Tribhuvan 
University, Dr. B.N. Nagnoor, Reader in Statistics, Indian Cooper- 
ation Mission, Nepal and Mr. Jasvir Singh Arora of the Punjab and 
Sindh Bank, Jullundur, for their kind suggestions for making this 
text easily intelligible and easy to grasp. 

I firmly believe that there is always scope for improvement. 
Hence I shall look forward to and gratefully acknowledge any 


suggestions received for the improvement of this text. 
S.P. GUPTA 
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To the Seventh Edition 


Every page of the book has been read very carefully so as to 
further improve its quality. Chapters on Regression Analysis, 
Probability, Statistical Quality Control and Analysis of Variance 
have been substantially enlarged and several new problems added 
therein. Latest questions of the various University Examinations 
have been included at the appropriate place in each chapter. The 
Indian Statistics portion has a thoroughly revised so as to 
incorporate latest changes. Every effort has been made to minimise 
the printing mistakes. 


While revising the book many useful suggestions were receiv- 
ed. I am grateful to all those who assisted me in this task. In 
particular, mention may be made of Prof. H.C. Gupta, Head of 
Depit. of Statistics, Delhi University, Dr. Abad Ahmad, Dean, 
Faculty of Management Studies, Dr. $. Neelamegham, Professor, 
Faculty of Management Studies, Mr. M.P. Gupta, Reader, Faculty 
of Management Studies, University of Delhi, Delhi and Mr. O.P. 
Tayal of Delhi School of Economics. 


I shall be failing in my duty if I do not express my gratitude 
to my wife who has assisted me in checking thoroughly al! 
calculations. 


I sincerely feel that the quality of any work can be improved 
only through comments and criticism. Hence I shall look forward 
and acknowledge gratefully all suggestions received. 


To the First Edition 


Statistical methods are playing un ever-increasing role in fram- 
ing suitable policies in a large number of diversified fields covering 
natural, physical and social sciences. Е. rmerly dealing only with 
affairs of the State, thus accounting for its name, Statistics today 
has become indispensable in al! phases of human endeavour. 


There is a vast body of literature on the subject. Although 
the general statistical methods are the same in all fields,this book is 
primarily meant for undergraduate students of Commerce and 
Economics. While writing this text, I have gone through the syllabi 
and examination papers of almost all universities where the subject 
is taught so as to make it as comprehensive as possible. A large 
number of solved illustrations (292 to be precise), mainly from 
examination papers, have been included in the text. Furthermore, 
to assist the students to gain proficiency in solving diverse variety 
of problems a large number of properly graded questions and 
problems (541 to be exact), mainly from examination question 
papers of various universities, are given at the end of each chapter. 
Answers to the problems, along with the hints where necessary. 
have also been provided. 


The lucidity of style and simplicity of expression have been 
my twin objects to remove the awe which is usually associated with 


ae 
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I am deeply indebted to my distinguished teacher, 
Dr. C. B. Gupta, Head of the Deptt. of Commerce, Shri Ram College 
of Commerce, who has always been a source of help, guidance and 
Inspiration to. me. I also owe a great deal to my teachers, 
Dr. R. L. Gulati of the Faculty of Mathematics, University of Delhi, 
Mr. K. R. Rao (Statistics) Delhi Cloth Mills, Shri J. К, Gautam, 
Head of Deptt. of Commerce, Rajdhani College, and Ss. Vidya 
Ratan and В. К. Lele of S. В. College who have taugbt me the 
subject. I acknowledge with thanks the valuable suggestions of 
my friends, Mr. R. N. Goel of S. R. College, Mr. Y. P. Sabharwal, 
Dr. N. К. Kakkar and Dr. D. B. Gupta of Ramjas College and 
Mr. Davender Gupta of Rajdhani College. 


Any suggestions for the improvement of the book shall be 
high!y welcome and gratefully acknowledged, 
S. R. College of Commerce, 8, P. GUPTA 
Delbi University, Delhi . 
35th June, 1969 
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Section 


STATISTICS—What and Why 


Meaning. We often come across statements like “The i 
of India has increased from 4392 million in the year 1961 to SA million 
in 1971", "An outlay of Rs. 53,411 crores is envisaged for the Fifth Five- 
Year Plan—of this Rs. 37,250 crores is in respect of public sector and Rs. 
16,161 crores for the private sector”, "The total production of rice 
during the Fifth Plan is targeted at 254 million tonnes as against the actual 
production of 208 m. tonnes during the Fourth Plan”, “About 30,000 stu- 
dents are likely to seek admission in colleges this year in Delhi Univer- 
sity", "The pass percentage at All-India Higher Secondary Examination 
stood at 851 in 1975 compared to 75 of last year". Such statements are 
quite common їп classroom lectures, daily newspapers, reports, speeches. 
books and on the radio. Since these statements contain figures they may 
be called numerical statements of facts. They are highly convenient 
forms of communication; at the same time they are clear, precise and 
meaningful. Analysis of such statements helps one in arriving at certain 
conclusions. For example, assuming that the present capacity of the 
University is to accommodate 27,000 students only, two new colleges should 
be opened or seats be increased in {һе existing colleges.otherwise, there 
may be student unrest, strike, etc. . f 


Facts and figures about any phenomenon—whether it relates to 
population, production, national income, profits, sales, births, deaths, etc. 
or any other matter—are called ‘Statistics’, In this sense, the term 
‘Statistics’ is considered synonymous with ‘figures’. Toa layman the 
term ‘Statistics’ usually carries only the nebulous and too often distasteful 
connotation of ‘figures’. - 

But in addition to meaning numerical facts, ‘Statistics’ refers to a 
subject, just as ‘mathematics’ refers toa subject as well as to symbols, . 
formulae, and theorems and ‘accounting refers to principles-and methods 
as well, as to accounts, talance sheets and intome statements. m this 
sense Statistics is а body of methods of obtaining and analysing-data in 
order to base decisions on them. It is a branch of scientific methiod.uged in. ° 
dealing with phenomena that can be described numerically either by counts © *- 
or by measurements. Thus the word statistics refers either to quantitative -` 
information or to a method of dealing with quantitative information, c aL 

рег ho 

However, it is in the second sense (statistical methods) that the 
word statistics is used- іп this text except in a few places where the con- 
text makes it quite clear that the fact and figures sense is intended, for 
example, in the phrase ‘statistical data’. 


The methods by which statistical data are analysed are called statis- 
tical methods, although the term is sometimes used more loosely to cover 


E-1:2 STATISTICS— WHAT AND WHY 


the subject ‘Statistics’ as a whole. The mathematical theory which is the 
basis of these methods is called the theory of statistics or mathematical statis- 
тіс. Statistical methods are applicable to a very large number of fields. 
economics, sociology, anthropology, business, agriculture, psychology, 
medicines, education—all lean heavily upon statistics. Nemerous books 
have been written on business statistics, agricultural statistics, industrial 
statistics, medical statistics, educational statistics, psychological statistics 
and other specific areas of application. It is true, of course, that these 
diversified fields demand somewhat different and specialized technique in 
particular problems yet the fundamental principles that underlie the 
various methods are identical regardless of the field of application. This 
will become evident to the reader if he realizes that statistical methods in 
general are nothing but a refinement of everyday thinking.* They are 
specially appropriate for handling data which are subject to variation that 
cannot be fully controlled by experimental, method and for which we can 
have only a fraction of the totality of observations which may exist. 


It should be noted at the very outset that Statistics is usually not 
studied for its own sake ; rather, it is widely employed as a tool—and a 
highly valuable one—in the analysis of problems in natural, physical and 
social sciences. In the latter area, Statistics often assumes its greatest 

_ importance іп the study of economics and business. . Statistical methods 
are used by governmental bodies, private business firms, and research 
agencies as an indispensable aid in (1) forecasting, (2) controlling, and 
(3) exploring. 


Statistical methods range from the most elementary descriptive 
devices which may be understood by the common man to those complica- 
ted mathematical procedures which can be apprehended only by the expert 
theoreticians. The purpose of this text isto discuss the fundamental 
principles and techniques of Statistics in simple апа easily comprehensible 
manner without going into the highly mathematical aspects of the subject. 


ORIGIN AND GROWTH OF STATISTICS 


Origin. In its present usage the word ‘statistics’ is barely a century 
old. However, in a general way, it has been in use fora much longer 
period and it may be interesting to trace the roots of the word itself. 


. The words ‘statist’, ‘statistics’ and ‘statistical’ are derived from the 
Latin word ‘Status’, Italian ‘Stato’ and German word ‘Statistik’ meaning 
a political state. The first term, viz., ‘statist’, is of much earlier origin 
than the other two. It is found, for instance, in the famous plays of 
Shakespeare, namely, Hamlet (1602), and Cymbeline (1610). The word 
‘statistics’ is known to һауе been. used for the first time in the Elements of 
Universal Erudition by Baran J.F. Von Bielfeld, translated by W. Hooper 
M.D. (3 Vols., London, 1770). One of its chapters is entitled ‘Statistics’ 
and contains à definition of the subject as “The science that teaches us what 
is the political arrangement of all the modern States of the known word.” 


The science of statistics is said to have originated from two main 
sources : 


*Freund and Williams : Modern Business Statistics. 
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l1... Governmental Records, and 

2. Mathematics. 

1. Governmental Records, This is the earlist foundation because all 
cultures with a recorded history had recorded statistics, and the recording, 
as far as is known, was done by agents of the Government for govern- 
mental purposes. : Thus, in ancient Egypt, the police prepared registration 
lists of all heads of families. In ancient Judea, а census of population 
was taken on several occasions, including one in 2030 B.C. when the 
population was estimated at 3,800,000. The first Roman census was 
taken in 435 B.C. Statistics were recorded in Roman times also about 
military strength, taxable capacity of the people, births and deaths, etc. 
Since statistical data were collected for governmental purposes, Statistics 
was then described as the ‘science of kings’ or ‘the science of statecraft'. 
Beginning with the sixteenth century, a large number of statistical hand- 
books were published. Many prominent people such as Captain John 
Graunt (1620—1674),. William Petty (1623—1687) and Henry contributed 
a great deal to the development of Statistics. Willian Petty was the author 
of Essay on Political Arithmetick (1690). He regarded Statistics as political 
arithmetic. 


2. Mathematics, The present body of statistical methods, particu- 
larly those concerned with drawing inferences about population from a 
sample, is based on the mathematical theory of probability which marked 
a major step in the intellectual history of the world. The theory emerged 
in the seventeenth century as a result of gambling among the nobility of 
France and England. The gamblers of 17th century attracted the atten- 
tion of such men as De Moivre, Fermet, Galileo and famous mathema- 
ticians like James Bernoulli, Daniel Bernoulli, Laplace and Gauss who 
discovered and developed the theory of probability while estimating the 
chances of winning ог losing in gamble. The modern statistician, as the 
gambler in the past, is engaged in calculating risks associated with a 
particular decision or course of action. The actual outcome in any single 
trial is unknown, but the theory of probability indicates what will happen 
if a very large number of such trials are undertaken. The famous math- 
matician De Moivre (1667—1754) discovered the normal curve which 
forms an important part of modern statistical theory. Laplace (1749— 
1827) and Gauss (1777—1855) independently arrived at the same results 
as De Moivre. The great mathematician, Quetelet (1796—1874) discov- 
ered the fundamental principle ‘the constancy of great numbers’ which is 
the basis of sampling. Much of the development in statistical techniques 
has taken place during the last century. The famous statisticians Francis 
Galton, .A.L. Bowely, Edgeworth, Karl Pearson, William S. Gosset , 
R.A. Fisher, Yates and several others have contributed substantially to 
the development of statistical methods and it is because of the efforts of 
these people that Statistics has reached the present heights as a body of 


knowledge. 


Growth of Statistics. Although statistics originated as a science of 
kings, there has been. a phenomenal development in the use of statistics 
in several varied fields. Statistics is now regarded as one of the most 
important tools for taking decisions in the midst of uncertainty. In fact, 
there is hardly any branch of science today that does not make use of 


E-1'4 STATISTICS— WHAT AND WHY 


Statistics. The following are the two main factors which are responsible 
for the development of Statistics in modern times : 


1. Increased demand for Statistics. In the present century consi- 
derable development has taken place in the field of business and commerce, 
governmental activities and science. Statistics helps in formulating suit- 
ane policies, and as such its need has been increasingly felt in all these 
spheres. 


Taking the case of business, not only has the magnitude of business 
cosiderably increased but the growing size of business has made its pro- 
blems more complex. Most of these problems are resolved in the light of 
factual information and hence the need for statistics. 


Taking governmental activities, there was a time when maintenance 
ot law and order was considered to be the primary function of the govern- 
ment and the policy of laissez faire (i.e. non-interference in economic 
matters) was supreme in the field of public policy. Today there is hardly 
any sphere in which the government has notentered. With this enlarge- 
ment of the functions of government, the demand for statistics has also 
increased. 


Coming to the sciences, one finds tremendous advancement in the 
existing sciences and also development of many new branches of sciences. 
Extensive research work is now being undertaken by many more persons 
than was done a century before. Since statistics is a tool of research, the 
demand for statistics has greatly increased in this sphere as well. 


E 2. Decreasing costs of Statistics. The time and cost of collect- 
ing data are very important limiting factors in the use of statistics. How- 
ever, with the development of electronic machines, such as calculators, com- 
puters, etc., the cost of analysing data has considerably gone down. This 
has led to the increasing use of statistics in solving various problems. 


Moreover, with the development of statistical theory the cost of 
collecting and processing data has gone down. For example, consi- 
derable advance has been made in the sampling techniques which enable 
us to know the characteristics of the population by studying only a part of 
it. Since 1935, a branch of statistics known as design of experiments has 
made rapid progress and it is now possible to collect and analyse statistics 
more promptly and economically. 


Of great interest, even to the non specialist in statistics, is the fact that 
much of the basic progress in statistical theory of the past few decades 
can be attributed directly toa single individual, Sir Ronald Fisher (born 
1890). As one writer puts it : “Fisher is the real giant in the development 
of the theory of Statistics." Although hundreds of scholars have contri- 
buted to the science of Statistics, Fisher must be credited with at least half 
of the essential and important developments in the theory as it now stands. 


However, an impression should not be formed that the theory of 
statistics is complete and final. In spite of the developments, the list of 
unsolved statistical problems is long and statistical research today is more 
vigorous than ever before. 
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STATISTICS DEFINED 

There have been many definitions of the term ‘statistics’—indeed 
scholarly articles have carefully collected together hundreds of definitions. 
Some have defined Statistics as statistical data (plural sense) whereas others 
as statistical methods (singular sense). A few definitions are analytically 
examined below : 
Statistical Data 

Quantitative or numerical information may be found almost every- 
where in business, economics and many other areas. It is probably more 
common to refer to data in quantitative form as statistical data. But not 
all numerical data is statistical and hence it is necessary to examine a few 
definitions of statistics to understand the characteristics of statistical data. 

Webster defined statistics as “the classified facts respecting the con- 
dition of the people in a State...especially those facts which can be stated in 
numbers or in tables of numbers or in any tabular or classified arrangement." 


. . The above definition is too narrow as it confines the scope of statis- 
tics to only such facts and figures which relate to the conditions of the 
people in a State. 


Yule and Kendall defined statistics as “By Statistics we mean quanti- 
tative data affected to a marked extent by multiplicity of causes.” 


This definition is less comprehensive than the one given by Prof. 
Horace Secrist who defined statistics as follows : 


“Ву Statistics we mean aggregates of facts affected to a marked extent 
by multiplicity of causes, numerically expressed, enumerated or estimated 
according to reasonable standards of accuracy, collected in a systematic 
manner for a pre-determined purpose and placed in relation to each other." 


This definition clearly points out certain characteristics which nume- 
rical data must possess in order that they may be called Statistics. These 
are as follows : 


l. Statistics are aggregates of facts. Single and isolated figures are 
not statistics for the simple reason that such figures are unrelated and 
cannot be compared. To illustrate, if it is stated that the income of Mr. 
‘X’ is Rs. 10,000 per annum, this would not constitute statistics although it 
is a numerical statement of fact. Similarly, a single figure relating to 
production, sale, birth, death, employment, purchase, accident, etc., 
cannot be regarded statistics although aggregates of such figures would be 
Statistics because of their comparability and relationship as parts of a 
common phenomenon. 

2. Statistics are affected to a marked extent by multiplicity of causes. 
Generally speaking, facts and figures are affected to a considerable extent 
by a number of forces operating together. For example, statistics of pro- 
duction of rice are affected by the rainfall, quality of soil, seeds and 
manure, method of cultivation, etc. It is very difficult to study separately 
the effect of each of these forces on the production of rice. The same is 
true of statistics of prices, imports, exports, sales, profits, etc. In the 
experimental sciences like Physics and Chemistry it -is possible to isolate 
the effect of various forces on a particular event. Ways and means are 
also being devised in ‘Statistics’ for segregating the effect of various forces 
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оп an event. However, it has proved to be a difficult task in statistical 
studies of phenomena which are influenced by a complex variety of factors, 
many of which are not measurable. 


3. Statistics are numerically expressed. АЦ statistics аге numerical 
statements of facts, i.e., expressed in numbers. Qualitative statements such 
as ‘The population of India is rapidly increasing’ ; or ‘The production of 
wheat is not sufficient’ ; or ‘India is a poor country’ do not constitute 
statistics. The reason is that such statements are vague and one cannot 
make out anything from them. On the other hand, the statement ‘The 
population of India increased by 2°5% in 1975 against 2'2% іп 1974’ is a 
statistical statement. 


4. Statistics are enumerated or estimated according to reasonable 
standards of accuracy. Facts and figures about any phenomenon can be 
derived in two ways, viz., by actual counting and measurement or by 
estimate. Estimates cannot be as precise and accurate as actual counts or 
measurements. For example, an estimate that 5 lakh people witnessed 
the Republic Day parade does not mean exactly 5 lakhs ; it may be a few 
hundreds or thousands more or less. On the other hand, if we count the 
number of students in a class and say that there are 60 students, this figure 
would be 100% accurate. In many cases, 100% accuracy of numbers may 
be difficult to attain. The degree of accuracy desired largely depends upon 
the nature and object of the enquiry. For example, in measuring heights 
of persons even inches are material whereas in measuring distance between 
two places, say, Delhi and Bombay, even furlongs can be ignored. Hence, 
in many statistical studies mathematical accuracy cannot be attained. 
However, it is important that reasonable standards of accuracy should 
be attained, otherwise numbers may be altogether misleading. 


5. Statistics are collected in a systematic manner. Before collecting 
statistics a suitable plan of data collection should be prepared and the work 
carried out in a systematic manner. Data collected in a haphazard manner 
would very likely lead to fallacious conclusions. 


6. Statistics are collected for a pre-determined purpose. The pur- 
pose of collecting data must be decided in advance. The purpose should 
be well defined specific. A general statement of purpose is not enough. 
For example, if the objective is to collect data on prices, it would not serve 
any useful purpose unless one knows whether he wants to collect data on 
wholesale or retail prices and what are the relevant commodities in view. 


7. Statistics should be placed in relation to each other. If numerical 
facts are to be called statistics, they should be comparable. Statistical 
data are often compared period-wise or region-wise. For instance, the 
population of India іп 1971 may be compared with that of earlier years ог 
with the population of other countries, say U.S.A., U.K., China, etc. 
Valid comparisons can be made only if the data are homogeneous, i.&., 
relate to the same phenomenon or subject and only likes are compared 
with likes. It would be megningless to compare the height of elephants 
with the height of human beings. 


In the absence of the above characteristics numerical data cannot be 
called statistics and hence “all statistics are numerical statements of facts 
but all numerical statements of facts are not statistics.” 
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Statistical Methods 

The large volume of numerical information gives rise to the need. for 
systematic methods which can be used to organise, present, analyse and 
interpret the information effectively. Statistical methods are primarily 
developed to meet this need. 


But whatare statistical methods ? The term statistics in this sense 
too has been defined differently by different writers. A few definitions are 
examined below : 


Prof A.L. Bowley has given three definitions. At one place he says, 
** Statistics may be called the science of counting." This definition is. too 
narrow because it covers only one aspect of the science, namely, the collec- 
tion of data, Other aspects like analysis, presentation, interpretation, etc., 
are completely ignored. 

At another place Bowley says, “Statistics may rightly be called the 
science of averages.” This definition also is not satisfactory because averages 
are only one of the devices used in statistical analysis. The other devices‘ 
like dispersi on, skewness, correlation, etc., are not at all covered by this 
definition. 

Still another definition given by the same author is “Statistics is the 
science of the measurement of social organism, regarded as a whole in all its 
manifestations." This definition again is inadequate because it confines 
the scope of Statistics only to sociology, i.e., man and his activities. Bowley 
himself recognized this when he remarked, “Statistics cannot be confined 
to any one science.” 

Boddington defines statistics as “the science of estimates and probabi- 
lities.” This definition is also unacceptable because estimates and 
probabilities are only a part of statistical methods. 

Croxton and Cowden have given a very simple and concisc definition 
of Statistics. In their view “Statistics may be denfined as the science of 
collection, presentation, analysis and interpretation of numerical data." This 
definition clearly points out rour stages in a Statistical investigation, 
namely : (i) collection of data, (ii) presentation of data, (iii) analysis of 
data, and (iv) interpretation of data. 

However, to the above stages one more stage may be added and that 
is the organization of data. Thus statistics may be defined as the science of 
collection, organization, presentation, analysis and interpretation of numerical 
data. 

According to the above definition, there are five stages in a statistical 
investigation : 

I. Collection. Collection of data constitutes the first step ina 
statistical investigation. Utmost care must be exercised in collecting data 
because they form the foundation of statistical analysis. If data are faulty, 
the conclusions drawn can never be reliable. The data may Бе avail- 
able from existing published or unpublished sources or else may be 
collected by the investigator himself. The firsthand collection of data is 
one of the most difficult and important tasks faced by a statistician. 
Therefore, like all scientific pursuits, the investigator must take into 
account whatever data have already been collected by others. This would 
save the investigator from foreseeable pitfalls, unnecessary labour and 


duplication of efforts. j 
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2. Organization. Data collected from published sources are gene- 
rally in organized form. However, a large mass of figures that are collected 
from a survey frequently needs organization. The first step in organizing 
a group of data is editing. The collected data must be edited very carefully 
so that the omissions, inconsistencies, irrelevant answers and wrong com- 
putations in the returns from a survey may be corrected or adjusted. After 
the data have been edited the next step is to classify them. The object of 
classification is to arrange the data according to some common characteris- 
tics possessed by the items constituting the data. The last step im organi- 
zation is tabulation. The object of tabulation is to arrange the data in 
columns and rows so that there is absolute clarity in the data presented. 

3. Presentation. After the data have been collected and organized 
they are ready for presentation. Data presented in an orderly manner 
facilitate statistical analysis. There are two different modes in which the 
collected data may be presented : 

(i) Diagrams and 

(ii) graphs 

4. Analysis. After collection, organization and presentation the 
next step is that of analysis. A major part of this text is devoted to the 
methods used in analysing the presented data, mostly in a tabular form. 
Methods used in analysing the presented data are numerous ranging from 
simple observation of the data to complicated, sophisticated and highly 
matiitszetical techniques. However, in this text only the most commonly 
used methods of statistical analysis are included, such as measures of 
central tendency, measures of variation, correlation, regression, etc. 


5. Interpretation. The last stage in statistical investigation is inter- 
pretation, j.e., drawing conclusions from the data collected and ana- 
lysed. The interpretation of data is a difficult task and necessitates a 
high degree of skill and experience. If the data that have been analysed are 
not properly interpreted, the whole object of the investigation may be defea- 
ted and fallacious conclusions be drawn. Correct interpretation will lead to 
a valid conclusion of the study and thus can aid one in taking suitable 
decisions. 

Since statistical methods help in taking decisions, statistics may 
rightly be regarded as “а body of methods for making wise decisions in the 
face of uncertainty.”* A modified form of this definition is given by Prof. 
Ya-Lun-Chou in whose words, “Statistics is а method of decision-making in 
the face of uncertainty on the basis of numerical data and calculated risks." 


This modern conception of the subject is a far cry from the one 
usualy held by laymen. Indeed even the pioneers in statistical research 
have adopted it only within the past two decades. 


Statistics : Science or Art 

Whether Statistics is a science or an art is often a subject of debate. 
Science refers to а systematised body of knowledge. It studies cause and 
effect relationship and attempts to make generalisations in the form of scien- 
tific principles or laws. It describes facts objectively and avoids vague 
judgments 2 -роой or bad. Science, in short, is like a lighthouse that 
gives light to the ships to find out their own way but does not indicate 
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the direction in which they should go. Art, on the other hand, refers to the 
skill of handling facts so as to achieve a given objective. It is concerned 
with ways and means of presenting and handling data, making inferences 
logically and drawing relevant conclusions. 

While a century ago there were some misgivings among natural scientists 
as to whether statistics had the right to be recognized as a distinct science, 
now almost all science is statistical. What this suggests is that the design 
of scientific experiments and the evaluation of their results makes use of 
principles and practices growing out of the science of statistics. However, 
statistics as a science is not similar to exact sciences like Physics, 
Chemistry, Zoology, etc. This is because statistical phenomena are generally 
affected by multiplicity of causes which cannot always be measured 
accurately. In other words, the science of statistics by its very nature is less 
precise than the natural sciences. It is science only ina limited sense, viz, 
as a specialised branch of knowledge. More appropriately, statistics may 
be regarded as a scientific method because it is really a tool which can 
be used in scientific studies. Wallis and Roberts have rightly remarked 
that “Statistics is not а body of substantive knowledge but a body of 
methods for obtaining knowledge.” 


If science is knowledge, then art is action. Looking from this angle, 
Statistics may also be regarded as an art. It involves the application of 
given method to obtain facts, derive results, and finally to use them for 
devising action. 

FUNCTIONS OF STATISTICS 

The following are the important functions of the science of 

Statistics : 

It presents facts in a definite form. 

It simplifies mass of figures. 

It facilitates comparison. 

It helps in formulating and testing hypothesis. 
It helps in prediction. 

It helps in the formulation of suitable policies. 


1. Definiteness. Numerical expressions are convincing and, therefore, 
one of the most important functions of statistics is to present general state- 
ments in a precise and definite form. Statements of facts conveyed in exact 
quantitative terms are always more convincing than vague utterances. 
Statistics presents facts ina precise | and definite form and thus. 
helps proper comprehension of what is stated. Consider, for example, 
a statement : “The production of wheat in India in 1976 is expected to 
be larger than that in 1975”. The reader will not have a clear idea of the 
situation from the statement. He would surely like to know what is the 
extent of increase in wheat production the writer has in mind. On the 
other hand, if we quantify the statement as, "The production of wheat 
in India is expected to increase from 100 million tonnes in 1971 to 200 
million tonnes in 1981”, it conveys à ` definite information. Similarly, 
statements like, ‘There is а lot of unemployment in India’ ; The popu- 
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2. Condensation. Not only does Statistics present facts in a definite 
form but it also helps in condensing mass of data into a few significant 
figures. Ina way, statistical methods present а meaningful overall infor- 
mation from the mass: of data. Thus, it is impossible for one to form a 
precise idea about. the income position of the people of India from a 
record of individual incomes of the entire population. However, the figure 
of per capita. income can be easily remembered by everyone. It may be 
‘useful also to have a classification of the population into various income- 
groups and present the data as follows : 


Monthly Income Population 

(Rs.) (Million) 
Below 500 20 
-1 20 
1,000—2,000 16 
2,000—5,000 3 
Above 5,000 1 
Total 70 


3. Comparison. Unless figures are compared with others of the 
‘same kind they are often devoid of any meaning. For example, to state 
that the population of India in 1971 was 546°9 million hardly means any- 
thing unless the figure is compared with earlier time periods or population 
‘of other countries, say, the U.K., the U.S.A., еіс: By furnishing suitable 
device for comparison of data, Statistics enable a better appreciation of 
the significance of a series of figures. 


4. Formulating and Testing Hypothesis. Statistical methods are 
extremely helpful in formulating and testing hypothesis and to develop 
new theories. For example, hypothesis like whether a particular coin is 
fair or not, whether chloromycetin is effective in checking typhoid, 
whether the credit squeeze is effective in checking price increases, whether 
students have benefited from the extra coaching, etc., can be tested by 
appropriate statistical tools. 


5. Prediction. Plans and policies of organisations are invariably 
formulated well in advance of the time of their implementation. A know- 
ledge of future trends is very helpful in framing suitable policies and 
plans. Statistical methods provide helpful means of forecasting future 
events. For example, if a businessman has to decide how much he should 
роми in 1977, he would like to know the expected sales for that year. 

e may use his subjective judgment and make a guess. However, a better 
method for him would be to analyse the sales data of the past years ог 
arrange a statistical survey of the market to obtain necessary data for 
estimating the sales volume of the next year. 


6. Formulation of Policies. Statistics provide the basic material 
for framing suitable policies. For example, it may be necessary to decide 
how much wheat India should import in 1977. The decision would depend 
upon the expected internal production and the likely demand for wheat in 
1977. Inthe absence of information regarding the estimated domestic 
output and demand for wheat the decision on imports cannot be made 
"with reasonable accuracy. If we know that domestic production is 
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likely to be, say, 120 million tonnes and demand would amount to, say, 
125 million tonnes. one can easily say that it would be necessary to import 
5 million tonnes of wheat. Indeed statistical data help in more accurate 
decision-making than is otherwise possible. 


Robert W. Burgess has beautifully summed up the functions of 
Statistics as “The fundamental gospel of statistics is to push back the 
domain of ignorance, rule of thumb, arbitrary or premature decisions, tradi- 
tions and dogmatism and to increase the domain in which decisions аге made 
and principles are formulated on the basis of analysed quantitative facts." 


Scope of Statistics 
The scope of statistics is so vast and ever-increasing that not only it 


subject-matter. It is а tool of all sciences indispensable to research and 
intelligent judgment and has become a recognized discipline in its own 
right. There is hardly any field whether it be trade, industry or commerce, 
economic, biology, botany, astronomy, physics, chemistry, education, 
medicines, sociology, psychology, or meteorology where statistical tools 
are not applicable. In fact the greatest victory of mankind of the 20th 
century that of landing of Apollo-II on the moon would not have been a 
success in the absence of statistical help. The applications of statistics are 
so numerous that it is often remarked “Statistics is what statisticians do" 
Let us examine a few fields in which statistics is applied : 


Statistics and the State 


Since ancient times the ruling kings and chiefs have relied heavily on 
statistics in framing suitable military and fiscal policies. Most of the 
statistics such as that of crimes, military strength, population, taxes, etc., 
that were collected by them were a by-product of. administrative activity. 
In recent years the functions of the State have increased tremendously. 
The concept of a State has changed from that of simply maintaining law 
and order to that of a welfare State. Statistical data and statistical 
methods are of great help in promoting human welfare. Statistics today 
are not exclusively à by-product of administrative activity—the State 
collects statistics оп several problems. These statistics help in framing 
suitable policies. All Ministries and departments of the Government 
whether they be Finance. Transport, Defence, Railways, Food, Commerce, 
Post and Telegraph or Agriculture, depend heavily on factual data for 
their efficient functioning. For example, the Transport Department can- 
not solve the problem of transport in Delhi unless it knows how many 
buses are operating at present, what is the total requirement and, therefore, 
how many additional buses be added to the existing fleet. Not only during 
peace times, but during days of war also, statistics are indispensable. In 
fact it is impossible to fight a war successfully in the absence of factual 


data about enemy strength. 


Statistics are so significant to the State that the government in most 
countries is the biggest collector and user ot statistical data. Such data 
is of immense help to many institutions and research scholars who further 
process it and arrive at useful conclusions which help in decision 


making. 
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Statistics and Business 


With the growing size and'ever-increasing competition the problems of 
the business enterprises are becoming complex and they are using more and 
more statistics in decision making. However, the employment of statistical 
methods in the solution of business problems belongs almost exclusively to 
the 20th century. In earlier days when business firms were small, owners of 
the firms were directly engaged їп almost all the areas of business activity. 
An owner of a smalf firm then might act as the stores manager, accountant, 
salesman, purchaser, etc. It was possible for him to make personal contacts 
with the customers and know exactly what they wanted from him. With the 
growth in the size of the business firms it has often become impossible for 
the owners to maintain personal contact with the thousands and lakhs of 
customers. Management has become a specialized job and a manager is 
called upon to plan, organize, supervise and contro! the operations of the 
business house. Since very little personal contact is possible with customers 
these days, a modern business firm faces a much greater degree of uncer- 
tainty concerning future operations than it did when the size of business was 
small. Moreover, most of the production these days is in anticipation of 
demand and, therefore, unless a very careful study of the market is made 

_ the firm’ may not be able to make profits. Thus a businessman who has to 
deal inan atmosphere of uncertainty can no longer adopt the method of 
trial and error in taking decisions. If he is to be successful in his decision- 
making, he must be able to deal systematically with the uncertainty itself 
by careful evaluations and applications of statistical methods, concerning 
the business activities. Business indeed runs on estimates and porbabilities. 
The higher the degree of accuracy of a businessman's estimates, the greater 
is the success attending on his: business. In recent years it has become 
increasingly evident that statistics and statistical methods have provided 
the businessman with one Of his most valuable tools for decision-making. 
i2 Business activities can broadly be grouped under the following 

eads : 
Production, " 
Sale, t 
Purchase, 
Finance, 
Personnel, 
Accounting, í t 
7. Market and Product Research, and: 
8. Quality control. \ EST emt 
With the help of statistical methods in respect ‘of each of the-above ` 
areas abundant quantitative information can be obtained which’ canbe of 
immense use in formulating suitable policies. For example, sampling méthods 
are used by marketing researchers in making surveys of consumer. 
perferences over certain brands of competitive merchandise. Similarly, th 
technique of statistical quality control helps in maintaining quality stand- 
ards without inspecting each and every item. Statistical tables and charts 
are frequently used by sales managers to present numerical facts of sales. 

Similarly in deciding what price of the commodities to fix, statistics are 

of great help. 
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The techniques of time series analysis and business forecasting enable 
the businessman to predict with a fair degree of accuracy the effect ofa 
large number of variables. In fact statistics is so highly useful to business 
that a prominent business executive and statistician said thirty years аро ` 
that “when the history of modern times is finally written, we shall read it as 
beginning with the age of steam and then progressing through the age of 


` electricity. to that of statistics.” This may be only a paradoxical exaggera- 


tion on the part of an over-enthusiastic statistician but the fact remains 
that consciously or unconsciously a large part of modern business 
is being organized around systems of statistical analysis and control. The 
scientific management movement of this century has especially emphasized 
the need for collecting facts and interpreting them carefully, as has its 
currently popular offspring ‘operations research’. 


However. it should be remembered that though statistical methods 
are extremely useful in taking decisions, they are not perfect substitute for 
commonsense. A practitioner. of business statistics must, therefore, 
combine the knowledge of the business environment in which he operates 
and its technological characteristics with a heavy:dose of commonsense and 


ability to interpret statistical methods to non-statisticians. 


Statistics and Economics 

Inthe year 1890 Prof. Alfred Marshall, the renowned economist, 
observed that “Statistics are the straw out of which I, like every other 
economist, have to make bricks." This proves the. significance of statistics 
in economics. Economics is concerned with the production and distribu- 
tion of wealth as well as with the complex institutional set-up connected 
with the consumption, saving and investment of income. Statistical data 
and statistical methods are of immensc help in the proper understanding 
of the economic problems and in the formulation of economic policies. 
Infactthese are the tools and appliances of an economist's laboratory. 
For example, what to produce, how to produce and for whom to produce— 
these are the questions that need a lot of statistical data in the absence 
of which it is not possible to arrive at correct decisions. Statistics of 
production help in adjusting the supply to demand. Statistics 
of consumption enable us to find out the way in which people of 
different strata of society spend their income. Such statistics are very 
helpful in knowing the standard of living and taxable capacity. of. the 
people. In the field of exchange we study markets, laws of prices based 
on. supply and demand, «cst of production, banking апа credit instru- 
ments, etc. What shall be the price of a particular commodity if its 
supply increases, OF decreases ? What price should a monopolist charge 


"<n order to reap the maximum profits ?—these are the questions which 
-cah Best be answered with the help of statistics. In fact, statistics 


are the very foundation-stone of the theory Of exchange. In distribu- 
tion, too, statistics play a vital role. How. the national income is 
to be calculated and how it is to be distributed, these are the 
questions which cannot be answered without statistics. In reducing 
disparities in the distribution of income and wealth statistics are of 
immense help. Similarly in solving problems of rising prices, growing 
population, unemployment, poverty, etc, one has to rely heavily on 
Statistics. In fact mostof the economic policies would be a leap in the 


dark in the absence of appropriate statistical information. 
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Statistical methods help not only in formulating appropriate econo- 
mic policies but also in evaluating their effect. For example, in order to 
check the ever-growing population if emphasis has been placed on the 
family planning methods, one can ascertain statistically the efficacy of such 
methods in attaining the desired goal. Statistics plays such an important 
role in the field of economics that in 1926, R.A. Fisher complained 
of “rhe painful misapprehension ‘that statistics is a branch of economics.” 


In recent years econometrics which comprises the application of 
statistical methods to the theoretical economic methods is widely used in 
economic research. Statistical methods of sampling are useful for collec- 
tingthe basic data of economic studies. Statistical methodology also 
indicates the reliability of the data and the significance to be attached to 
them. The derivation of demand functions, the field in which 
the applications of econometrics was first made, continues to be of major 
jnterest to economists. Similarly, the production functions, cost functions 
and the consumption functions present many difficult problems in the 
analysis of which statistical tools are of immense use. 


Thus economists today are no. longer content to theorize in abstract 
terms, citing statistics only as needed to support their arguments. Instead 
they utilize the excellent data now available to build a sound factual found- 
ation for their reasoning. Some of the uses of statistics in economics are 
as follows : 

1. Measures of gross national product and input-output analysis 
have greatly advanced overall economic knowledge and opened up entirely 
new fields of study. 


2. Financial statistics are basic in the fields of money and banking, 
short-term credit, consumer finance and public finance. 


3. Statistical studies of business cycles, long-term growth and seasonal 
fluctuations serve to expand our knowledge of economic instability and 
to modify older theories. 


4. Studies of competition, oligopoly and monopoly require statisti- 
cal comparisons of market prices, cost and profits of individual firms. 


„5. Statistical surveys of prices are essential in studying the theories 
of prices, pricing policy, and price trends, as well as their relationship to 
the general problem of inflation. 


6. Operational studies of public utilities require both statistical 
and legal tools of analysis. 


7. Analyses of population, land economics and economic geogra- 
phy are basically statistica! in their approach. 


8. In solving various economic problems such as poverty, unemploy- 
ment, dísparities in the distribution of income and wealth, statistical data 
and statistica) methods play a vital role. 

Statistics and Physical Sciences 

The physical sciences, especially astronomy, geology and physics. 
were among the fields in which statistical methods were first developed 
and applied, but until recently these sciences have not shared the 20th 
century developments of statistics to the same extent as the biological 
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and socialsciences. Currently, however, the physical sciences seem to 
be making increasing use of statistics, especially in astronomy, chemistry, 
engineering, geology, meteorology and certain branches of physics. 


Statistics and Natural Sciences 

Statistica! techniques have proved to be extremely useful in the 
study of all natural sciences like astronomy, biology, medicines, meteo- 
rology, zoology, botany, etc. For examp/e, in diagnosing the correct 
disease the doctor has to rely heavily on factual data like temperature of 
the body, pulse rate, blood pressure, etc. Similarly, in judging the efficacy 
ofa particular drug for curing a certain disease experiments have to be 
conducted and the success or failure would depend upon the number of 
people who are cured after using the drug. In botany—the study of plant 
life—one has to rely heavily on statistics in conducting experiments 
about the plants, effect of temperature, type of soil, etc. In fact it is diffi- 
cultto find any scientific activity where statistical data and statistical 
methods are not used. 

Statistics and Research 

Statistics is indipensable in research work. Most of the advance- 
mentin knowledge has taken place because of experiments conducted 
with the help of statistical methods. For example, experiments about 
crop yields and different types of fertilisers and different types of soils 
orthe growth of animals under different diets and environments are fre- 
quently designed and analysed with the help of statistical methods. Statisti- 
cal methods also affect research in medicine and public health. In fact, 
there is hardly any research work today that one can find complete with- 
out statistical data and statistical methods. Also it is impossible to under- 
stand the meaning and implications of most of the research findings in vari- 
ous disciplines of knowledge without having at least a speaking acquaint- 
ance with the subject of statistics. 

Statistics and Other Uses 

We have discussed above the significance of statistics in some im- 
portant fields. Besides these, statistics are useful to various institutions such 
as bankers, brokers, insurance companies, auditors, social workers, labour 
unions, trade associations and chambers of commerce. The banks have 
to make a very careful study ofthe cash requirements otherwise they 
may find they are short of cash and their existence is at stake. Similarly 
the premium rates of the life insurance companies are based upon very 
careful study of the expectation of life. 

These references to statistical applications are not intended to be 
exhaustive, but they simply suggest the deversity of applications ofthe 
underlying methods and ideas of statistics. In fact the applications of 
statistics are so numerous that statistics today has risen from the science 
of statecraft to the science of universal applicability. It is instrumental 
in enhancing human welfare and is such a master-key that enables to 
solve the problems of mankind almost in every field. Most of the people 
make use of statistics consciously or unconsciously in taking decisions. 
Statistical knowledge is in fact essential for a good citizen. H.G. Wells 
was right when he said "Statistical thinking will one day be as necessary for 
efficient citizenship as the ability to read and write." 
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It must be remembered that the statistical approach, though universal 
in its underlying ideas, must be tailored to fit the peculiarities of each concrete 
problem to which it is applied. It is dangerous to apply statistics in cookbook 
style, using the same re^ines over and over, without careful study of the 
ingredients of each new problem.* 

Also the reader must understand that statistics is not a dry, abstract 
and unrealistic pursuit followed by a small group of highly trained mathe- 
miaticians, but rather a vitally important part of the economic and business 
life of the community. The usefulness of statistics to the reader depends 
to a great extent on his ability to use his imagination in applying this tool 
to his own particular situation, 

LIMITATIONS OF STATISTICS 

Despite the usefulness of statistics in many fields, impression should 
“not be carried that statistics are like magical devices which always provide 
the correct solution to problems. Unless the data are properly collected 
and critically interpreted there is every likelihood of drawing wrong con- 
clusions. Therefore, it is also necessary to know the limitations and the 
possible misuses of statistics. The following are the important limitations 
of the science of statistics : 

1. Statistics does not deal with individuals. Since statistics are 
aggregates of facts, the study of an individual fact lies outside the scope 
of Statistics. Thus ‘Mr. X is 5’ 8” tall’.does not constitute a statistical 
statement whereas “The average height of an Indian is five feet" is a 
statistical statement. Similarly if we study only one figure of national 
income, poppin. imports or exports of India, it would not constitute 
statistical data unless we have similar figures of other countries or of the 
same country for the different periods so as to facilitate comparison. 

2. Statistics deals only with quantitative characteristics, Statistics 
are numerical statements of facts. Such characteristics as cannot be 
expressed in numbers are incapable of statistical analysis. Thus, qualitative 
characteristics like honesty, efficiency, intelligence, blindness and deafness 
cannot be studied directly. However, it may be possible to analyse such 
problems statistically by expressing them numerically. For example, we 
may study the intelligence of boys on the basis of the marks obtained by 
them in an examination. 

3. Statistical results are true only on an average. The conclusions 
obtained statistically are not universally true ; they are true only under 
certain conditions. This is because statistics as a science is less exact as 
compared to natural sciences. 

4. Statistics is only one of the methods of studying a problem. Sta- 
tistical tools do not provide the best solution under ail circumstances. 
Very often, it is necessary to consider a problem in the light of a country’s 
culture, religion and philosophy. Statistics cannot be of much help in 
studying such problems. Hence statistical conclusions should be supple- 
mented by other evidences. 

5. Statistics can be misused.* The greatest limitation of statistics 
is that it is liable to be misused. The misuse of statistics may arise 
because of several reasons. For example, if statistical conclusions are 
based on incomplete information, one may arrive at fallacious conclusions. 


a e 
*Wallis and Roberts : Statistic—A Modern Approach, p. 12. 
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Thus the arguments that drinking beer is bad for longevity since 99% of 
the persons who take beer die before the age of 100 years is statistically 
defective, since we are not told what percentage of persons who do not 
drink beer die before reaching that age. Statistics are like clay and they 
can be moulded in any manner so as to establish right or wrong conclu- 
sions. In this context, W.I. King pointed out, “Опе of the shortcomings 
of statistics is that they do not bear on their face the label of their 
quality." Moreover, any Tom, Dick and Harry cannot deal with statistics. 
It requires experience and skill to draw sensible conclusions from the 
data ; otherwise, there is every likelihood of wrong interpretations. The 
very fact that it may lead to fallacious conclusions in the hands of inexpe- 
rienced people limits the possibility of mass popularity of such a useful 
science. Also statistics cannot be used to full advantage in the absence of 
proper understanding of the subject to which it is applied. 


Distrust of Statistics 

By distrust of statistics we mean lack of confidence in statistical 
statements and st:tistical methods. It is often commented by people, 
“Statistics can prove anything." “There are three types of lies—lies, 
damned lies and statistics—wicked in the order of their naming." The 
following three main reasons account for such notions being held by people 
about statistics : 

1. Figures are convincing and, therefore, people are easily led to 
believe them. . 

2. They can be manipulated in such a manner as to establish fore- 
gone conclusions. 


Ў 3. Even if correct figures ate used they тау be presented in such a 
manner that the reader is misled. For example, note the following state- 
ment: “Тһе profits of firm А are Rs. 40,000 for 1975 and that of firm B 
Rs. 50,000 for the same period." On the basis of this information only, 
one would form the opinion that firm B is decidedly better than firm А. 
However, if we examine the amount of capital invested in both the firms, 
the quality of work done, etc., we might reach a different conclusion. 
Hence, while making use of statistics one should not only avoid outright 
falsehood but also be alert to detect possible distortion of truth. 


The various ways in which statistics are often misused shall be dis- 
cussed in detail in a subsequent chapter. Suffice here to know that 
Statistics neither proves anything nor disproves anything. It is only a tool, 
ie., a method of approach. Tools, if properly used, do wonders and, if 
misused, prove disastrous. The same is true of statistical tools. If used 
properly, they help in taking wise decisions and if misused they can do 
more harm than good. But the fault does not lie with the science of 
Statistics as such. A few interesting examples can be given to illustrate 
the point. Medicines are meant for curing people, but if a wrong medicine 
is taken or an excessive dose of a medicine is taken a person may die. We 
cannot blame the medicine for such a result. Similarly, if a child cuts his 
finger with a sharp knife, it is not the knife that is to blame, but the person 
who kept the knife at a place that the child could get it. These examples 
help us in emphasising that if statistical facts are misused by some people 

‚ it would be wrong to blame the science as such. It is the people who 


. 
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are to be blamed. In fact statistics are like clày of which one can make a 
God or a Devil as he pleases. 


Statistical Methods vs. Experimental Methods 


Man acquires knowledge from a variety of sources. In early times, 
it was believed that acquiring knowledge was a matter of chance and its 
sources were unknown. But the tremendous advance in human knowledge 
that has taken place in the last two centuries is mainly due to the adoption 
of systematic methods and not just a matter of chance. Such methods as 
are урау used in enlarging knowledge are known аз soientific 
methods. 


There are two primary methods employed for advancing knowledge, 
namely, experimental methods and statistical methods. Experimental me- 
thods are the best known of scientific methods and have been historically 
most fruitful. Under this method, cause and effect relations are often 
established or investigated within a controlled set-up in the laboratory. 
The experimental methods, however, can be adopted only in the physical 
and natural sciences like Physics, Chemistry, Astronomy, etc. wherein it 
is possible to isolate individual causes and specific effects for closer obser- 
vation and analysis. In most cases quantitative as well as qualitative 
aspects of a physical phenomenon are also measurable. In social scien- 
ces like Economics, Political, Science, Sociology, etc., it is difficult to apply 
experimental methods inasmuch as the various forces affecting a particular 
phenomenon cannot be studied in isolation nor are they measurable with 
precision in all cases. For example, if one were to specify the significance 
of various causes determining the price of wheat in India, the application 
of experimental methods would require the study of casual factor in isola- 
tion from others and measurement of the intensity of its effect—an impos- 
sible task. In order to determinc the effect ofa change in supply of 
wheat on its price, it would be necessary to ensure that tastes of people, 
attitude of traders to change in supply, disposable funds of people, etc. can 
be studied in isolation. However, these factors are so inextricably bound up 
with each other that it would be very difficult to isolate any one of them 
for experimental purposes. 


‘However, it would be wrong to ignore altogether those fields where 
experimental methods cannot be applied. In such cases resort may be 
had to statistical methods. In applying statistical methods a problem is 
studied systematically as in the exprimental methods, but the system 
used is пої ће same. Here we allow all forces to operate since they 
cannot be kept constant. We then record the variations in the forces 
and trv to determine the part played by each in influencing the result. 
Undoubtedly, this method is ordinarily more difficult than the experimen- 
ta] method and the results are not as accurate but they are decidedly better 
than no results. Pointing out the impedance of statistical methods, Crox- 
ton and Cowden have remarked, “Without an adequate understanding of 
the statistical methods, the investigator in the social sciences may be like the 
blind man groping in adark room for a black cat that is not there. 
methods of Statistics are useful in an over-widening range of human activities 

jn any field of thought in which numerical data may be had.” 


However, it should be noted that the distinction between the experi- 
menta! methods and the statistical methods is somewhat formal and 
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arbitrary and should not be taken as anything rigid and definite. In prac- 
tice, the scientists often combine elements of both experimental and statis- 
tical approaches. As a matter of fact, most important of the statistical 
methods orignated in fields of Physics and Astronomy—fields which we 
usually consider to be ‘exact’ sciences. On the other hand, even the social 
Scientist can and does use a certain amount of controlled study in his in- 
vestigation. It may also be pointed out that the statistical method is 
neither the only method employed in research nor the best approach to 
problem. Just as the carpenter has a number of tools each appropriate for 
a different sort of operation, so also the researcher can avail himself of 
various techniques which are the tools of his trade and each of which is 
appropriate to a specific type of situation. Which technique or techniques 
Should be applied in a particular situation would primarily depend upon the 
object of investigation. Just as the choice of a wrong tool by the carpenter 
is likely to spoil the work, the choice of a wrong method by the statistician 
wouid similarly lead to wrong conclusions. 


SUGGESTED READINGS 
1. Griffin - 5 Statistics—Methods and Applications, Ch. 1 
2. Neiswanger 3 Elementary Statistical Methods, Ch. 1 
3. Simson and Kafka : BasicjStatistics!Ch. 1 
4. Spurr and Bonini d Statistical Analysis for Business Decisions, Ch, 1 
5. Wallis and Roberts — : Statistics—4 New Approach, Chs. 1 & 2 


2 |Organising a Statistical Survey 


Numerical data constitutes the raw material for statistical analysis. 
Data can be obtained through a statistical survey also called statistical 
enquiry or investigation. For example, if survey is made about the consu- 
mption pattern ofa particular class of people the investigator will obtain 
such type of data asto what percentage of income is spent by these 
people on food, clothing, shelter, education, fuel and lighting, etc. 


A statistical survey may be either a general purpose survey or special 
porpose survey. In a general purpose survey we obtain data which are use- 
ful for several purposes. The best example of this type of survey is the 
population census taken every 10 years in India, Such a survey provides 
information not only about the total population but about its division in- 
to males and females, literates and illiterates, employed and unemployed, 
age distribution income distribution etc. A special purpose survey is that 
in ss data obtained are useful in analysing a particular problem 
only. 


_A statistical survey passes through several stages before completion, 
starting from planning and ending with writing the final report. These 
stages can be summarized under two broad heads : 


I. Planning the survey 
П. Executing the survey 


I. PLANNING THE SURVEY 


Proper planning of a survey is of paramount importance because the 
quality of survey results depends considerably on the preparations made 
before the survey is conducted. The matters which require careful consi- 
deration at the planning stage may be enumerated as follows : 


1. Purpose of the survey and the nature of information to be 
collected. 


2. Scope of the survey. 
3. Unit of data collection ? 
4. Sources of data (i.e., primary, secondary or both). 


5. Technique of data collection (sample or census and, if sample, 
the method of sampling). 


6. Choice of a frame, or construction of a frame, if none is 
available. 

7, Degree of accuracy desired. 

8. Miscellaneous considerations. 


* The steps in planning are the same as preliminary steps before data 
collection. 
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Obviously, these matters cannot be viewed in isolation from one 
another. To a certain extent, the decision relating to any one aspect 
is bound to influence decisions on other aspects. Hence, there is a clear 
case for an integrated view being taken of the various aspects of an 
enquiry. Decisions on particular -matters should be tentative and subject 
to modification until the plan as a whole has. been finalised. 

Determination of the precise subject on which information is required, 
the degree of accuracy desired and the ways in which the information can 
be best obtained often constitutes the most difficult and crucial point of. 
planning an enquiry. Thus, extreme care and skill should be brought to 
bear on these aspects at the planning stage of an enquiry. 

l. Specification of the purpose. The objective of а statistical 
survey should be clearly set out at the very beginning. This will invari- 
ably indicate the type of information which is needed and the use to 
Which the information obtained will be put. For example, if the object 
ofan enquiry is to study the nature of Price changes, over a period of 
time, it would be necessary to collect data on commodity prices and it 
must be decided whether it would be helpful to study wholesale or retail, 
prices, and the possible uses to which such information could be put. 
The object of an enquiry may beeither to collect specific information 
relating to a problem or adequate data to test a hypothesis or verify a 
given proposition. Failure to set out clearly the purpose of enquiry is bound 
to lead to confusion and waste of resources, In the case of surveys which 
are likely to provide information that will be of value to different organi- 
zations or government departments, a detailed statement of the uses to 
which the information obtained could be put may be prepared. This will 
be helptul to all concerned and they might be expected to make sugges- 
tions for suitable modifications before the survey starts. 

2. Scope of the survey. Once the purpose of survey has been clear- 
lystated, the next step is to decide about the scope of the survey, ie., 
its coverage with regard to the type of information, the subject-matter and 
geographical area. For instance, an enquiry relating to industrial rela 
tions may be undertaken with the help of data relatIng to trade union 
membership, industrial disputes, wages of workers, etc., or only with the 
help of data on the frequency and severity of strikes. Likewise, an enquiry 
may relate to India as а whole, or one particular State, or an industrial 
town. The larger the coverage of the investigation, the more representative 
are likely to be the results. However, much will depend upon the 
purpose. 

Three factors exert great infiuence on scope, namely, the object of 
enquiry, availability of time and availability of resources, The investiga- 
tion should be carried out within a reasonable period of time ; otherwise 
the information collected may become out-of-date, and have no meaning 
асай. For example, ifa Pay Commission is set up to recommend dear- 
ness allowance on the basis of the rise in prices, and the Commisslon 
takes two years to submit its report there is every possibility of its find- 
ings being out-of-date. Delay may also result in losses ; for instance, 
Strikes and lockouts may take place causing considerable loss of output. 
Some departments of the Government are notorious for the delay in pub- 


lication of information which at times serves little or no purpose because — 
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The scope of an enquiry usually fixes the limits of the enquiry. 
However, a certain amount of discretion can always be exercised in this 
respect. Careful consideration should, therefore, be given to the inclusion 
or omission of marginal categories of information, particularly those on 
"which the collection of data is likely to be difficult or for which an 
adequate frame is lacking. 


3. The unit of data collection. Before organising the task of col- 
lecting data the statistical unit or units must be clearly defined for purposes 
of the investigation. The unit in terms of which the investigator counts 
or measures the variables or attributes selected for enumeration, analysis 
and interpretation is known as a ‘statistical unit’. For example, ina 
population census the statistical unit is a person. Similarly, if the number of 
houses in a particular area is counted then the unit isa house. However, 
the problem of defining the unit is not as simple as it appears to be. 
To take an example, if we are making a study of the size of sugar mills we 
have different criteria of measuring the size of mills, such as, capital em- 
ployed, number of employees, total production, etc. The investigator has to 
select one of these for size classification and then proceed to collect neces- 
sary information, If capital employed is selected as the basis of classi- 
fication, the unit may be the rupee or thousands of rupees. If the number 
of workmen is the basis then an employee will be the statistical unit. The 
basis of determining the size must, therefore, be clearly defined and the 
same definition followed throughout the survey. 


. While fixing the statistical unit for an enquiry, it is useful to keep in 
view the following : 


(i) The unit must suit the purpose of the enquiry. 
(ii) It should be simple to understand. 
(iii) It should be specific. 

; (iv) It should be stable in character. If the unit changes its charac- 
teristics, e.g., a yard at one time and a metre at other time, the measure- 
ments and counts would be misleading. 

(у) The unit should be uniform throughout the study so that there 
сап be valid comparisons. If units are defined differently at different stages 
of the survey, not only comparisons would become difficult but also they 
would lead to wrong or even absurd conclusions. 


A statistical unit may be (a) arbitrary, or (b) conventional. In other 
words, it may be used in a special sense or in the sense prevalent in com- 
mon usage. But whatever be the sense їп which the unit is used, it is 
essential that the meaning should be clear and unambiguous. 


Types of statistical units. The statistical units can be broadly 
elassified under two heads : 


(i) Units of collection. 
(ii) Units of analysis and interpretation. 
(i) Units of collection. These are those in terms of which data are 
eollected. They involve either counting or measurement thé former be- 
employed іп the case of physical items and the latter in respect of 
itative attributes. In the process of collection, therefore, one may 
deal with either discrete entities and events relating to them as in the case 
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of persons, houses, livestock, number of accidents, and number of deaths, 
x with measurable quantities and value units such as tonnes, kilograms, 
itres. 

. The units of collection may be simple, compound or complex. A 
simple unit is one which represents a single condition without qualifications. 
Examples of such units are a house, a child and arupee. А compound 
unit is a simple unit the comprehension of which is subject to some quali- 
fication. Thus, the simple unit *worker' may be qualified as *skilled worker" 
in which case we should know not only the meaning of *worker' but also 
that of the term ‘skilled’ in relation to worker. Similarly, we may talk 
of ‘married man,’ ‘part-time employee’ and *machine-hour. А complex 
unit is formed by adding to a simple unit two or more qualifications. 
Examples of such units are “а unit of production per machine-hour’, ‘va!ue 
of gold reserves against bank notes’. 

(ii) Units of analysis and interpretation. Statistical data are generally 
collected for making comparisons. Comparisons can be made either with 
reference to time or space. Units of analysis and interpretation are those 
which facilitate comparisons. They include (a) rates, (b) ratios and per- 
centages, and (c) coefficients. 

*Rates' are used in those cases where comparisons are made between 
quantities of different kinds, i.e., where the numerator and the denomi- 
nator are not of the same kind, such as birth rates and death rates. 
Rates are usually expressed per thousand. However, there is no hard 
and fast rule about it. ‘Ratios and percentages’ are used where quanti- 
ties to be compared are of the same kind, for instance, ‘The ratio of lite- 
rates to illiterates is 1 : 4’ or ‘Literates are 20% of the population’. Rate 
per unit is called a ‘coefficient’. For example, if it is stated that the 
death rate in India at present is 1 6 per cent or 1:6 per thousand it means 
that the coefficient of deaths is 0016. If this coefficient (0:016) is multi- 
plied by the total population we obtain the total number of deaths. 


4. Sources of data.* After the purpose ani scope have been 
defined, the next step is to decide about the sources of data. The sources 
of information may be either primary or secondary. When the investigator 
collects first hand data for the purpose at hand such data are known as 
primary data. On the other hand, if he obtains the data from published 
or unpublished sources such data will constitute secondary data for him. 
Quite often it is necessary to make use of both the sources in a particular 
investigation. However, much depends upon the purpose and scope of 
investigation. j 

5. Technique of data collection.] There are two important techni- 
ques of data collection, namely (i) census technique ; and (ii) sample 
technique. A census is a complete enumeration of each and every unit 
of the universe whereas in a sample only a part of the universe is studied 
and conclusions about the entire universe are drawn on that basis. For 
example, if the data about the consumption pattern of the people of Delhi 
are to be collected an investigator has to decide as to whether heads of 
each family аге to be contacted or whether heads of only a few families 
are to be contacted. In the former case, it is the census method whereas 


* For details please refer to'Chapter 3. 
+ For details please refer to Chapter 4. 
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jin the latter the sample one. The census method is costlier and more 
"бте consuming as compared to the sample method. The investigator 
must decide which technique he will use. The choice would depend upon 
а number of factors such as: (i) the availability of resources, (ii) the 
time factor, (iii) the degree of accuracy desired, and (iv) nature and scope 
in the problem. 


6. The frame. The term ‘frame’ refers to a list, map or other 
specification of the units which constitute the available information 
relating to the population designated for a particular Survey scheme. For 
example, if we want to find out the capital invested and number of 
workers working in small-scale industries in Delhi, we must have a 
complete list of names and addresses of all the small-scaie firms. This 
list of names and addresses will be called а ‘frame’, The whole structure 
of enquiry is to a considerable extent determined by the frame. The 
method of survey which is suitable fora given type of material may be 
different in different territories because different types of frames have to be 
used. Consequently, until nature and accuracy of the available frames 
are known, detailed Planning of the survey cannot be undertaken. If 
no frame exists, the construction of a frame suitable for the purpose of 
the survey will constitute a major part of planning. Various types of 
defects may exist in the available frames : frames may be: (i) innaccurate, 
(ii) incomplete, (iii) subject to duplication, (iy) inadequate, or (у) out-of- 
date. Itis, therefore, essential at the outset of the survey to carry out a 


Such an investigation will naturally commence with a study of the 
administrative machinery by which the frame has been constructed and by 
which it is kept up-to-date and may also have to include a certain amoun 


about the degree of accuracy that he wants to attain. It may be pointed 
out that absolute accuracy is seldom Possible in statistical work because 
(i) statistics are based on estimates ; (ii) tools of measurement are not 
always perfect ; and (iii) there ma be unintentional bias on the part of 
the investigator, enumerator or informant. Hence, if an attempt is made 
to attain 100% accuracy, it would not be realistic. Even where perfect 
accuracy is possible it may not be worth the time and money likely to be 
spent in attaining it. Degree of accuracy desired primarily depends upon 


grams has no significance. Riggleman and Frisbee rightly pointed out that 
“The necessary degree of accuracy in counting or measuring depends 
upon the practical value of the accuracy in relation to its cost." However, 
it does not mean that one should Sacrifice accuracy to keep down the 
Costs. It would ultimately depend upon the pu of investigation. 
Quite often, the investigator finds it desirable to pat ag data quickly and 
have approximate results rather than spend a lot of time and money for 
attaining a slightly higher degree of accuracy. It is, therefore, desirable 
that an eye be kept on the Possible inaccuracies that are likely to arise 
due to clerical and other types of errors so that they may be eliminated 


ү 
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8. Miscellaneous considerations. Consideration should be given 
to various other matters such as whether theenquiry is (a) official, semi- 
official or non-official, (b) confidential or non-confidential, (c) regular Ог 
ad hoc, (d) initial or respective, and (e) direct or indirect. 

An official suvey is conducted by or on behalf of Central, State or 
Loca! Governments, a semi official enquiry by such bodies as enjoy govern- 
ment patronage, and a non-official enquiry by private bodies or individuals. 
The facilities available will naturally differ according to the nature of the 
enquiry. Jn a legal (official) enquiry people may be compelled to supply 
information; in a semi-cfficial enquiry people may be requested and 
information may be available without much difficulty, whereas in a non- 
official enquiry the investigator may have to face the greatest difficulty in 
collecting data. 

A confidential survey is that the results of which are kept secret 
and are not made known to the general public. On the other hand, a 
non-confidential survey is one the results of which are published, i.e., 
the results ot such survey are open to the general public. 


ч A regular survey is that in which data are collected at regular 
intervals over a period of time whereas in an ad hoc survey data are: 
collected as and when necessary without any regularity. 

An initial survey is опе that is carried out for the first time whereas 
а repetitive survey is one that is conducted in continuation of previous 
enquiries. In case of the former, it is necessary to formulate a plan of 
data collection whereas in the latter case such a plan already exists and 
may only need modification in the light of past experience. 


A direct survey is one where data are capable of direct quantitative 
measurement such as height, weight and income. Onthe other hand, in 
indirect enquiry direct quantitative measurement is not possible as for 
example, intelligence, efficiency and honesty. In the latter case, опе has to 
take up certain objective measurable phenomena which reflect the quali- 
tative phenomenon, and then proceed to collect data. 

The matters discussed above are in the nature of precautions inten- 
ded to ensure that the material obtained is reliable. Some of these рге-. 
cautions are nothing more than commensense, but are nonetheless worth 
noting, for if they are neglected, the results may be completely useless. 


IL. EXECUTING THE SURVEY 


After a plan of data collection has been prepared, the next step is to 
execute the survey. The various phases of the work subsequent to the: 
planning stage may be enumerated as follows : 


1. Setting up an administrative organisation, 

2. Design of forms. 

3. Selection, training and supervision of the field investigators. 
4. Control, over the quality of the field work and field edit. 

5. Follow-up of non-response. 

6. Processing of Data 

7, Preparation of Report. 


1. Setting up an administrative organisation. The administrative orga-- 


nisation required for an enquiry will depend very much on the nature and 
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Scope of the enquiry. Every opportunity should be taken to utilise exis- 
ting administrative and office organisation, When the enquiry covers a 
large area, Supervision from a central Office is likely to be difficult and 
in such cases it is best to establish regional offices. Very frequently, some 
existing organisation can be used for this Purpose. 

2. Design of forms. Careful attention should be given to the 
designing of various forms that will be used in the course of the enquiry, 
Specially the forms of questionnaire, 


3. Selection, training and supervision of field investigators. In most 
Surveys the data are to be collected through enumerators who work part 
time or full time, The nature of the enumerator’s job is such that great 
care has to be exercised in his selection. In fact since the very success of 
Survey depends largely upon the field investigators it is essential that they 
are properly selected, thoroughly trained and their work closely supervised. 
Field investigators may be specially appointed, or they may consist of the 


The enumerators should be honest, intelligent hard working 
and be able to create friendly atmosphere and put the respondent 
at his care. He must speak the laüguage of the respondents, ask 
questions properly and intelligently and record the response accurately 
and completely.” By his friendly and courteous attitude he should 
be tactfu enough to discourage irrelevant conversation and elicit 
response from those who are apparently unwilling to co-operate. . Since 
field work is a Strenous job, there should be adequate provision for 
Test and the salary shouid also be attractive. Suitable tests like intelli- 
Sence test, aptitude test, etc. can be employed for Selecting the right type 
of enumerators, 


After having selected the enumerators through suitable tests, the 
next problem is that of making arrangements for the Proper training of 
the enumerators. The enumerators should know the Purpose of the sur- 
vey and how the results may possibly be used. The manner in which 


explained at length with examples. They should know the definitions of 
the terms used in the questionnaire or schedule and the intricate prob- 
lems involved in using them in the field. Mock interviews can be used 
in the classroom, the instructor and the student assuming the roles 
of respondents and interviewers respectively. The instructor brings up 
difficult situations and the whole class fills questionnaires which are later 


the questionnaires. The data collected under supervision are examined 
by the instructors, who determine whether reasonable standards have been 
maintained. Those who are found up to the standard can be asked to 
begin their job. : 

The training may also be given with the help of instruction manuals. 
These manuals explain cleariy the job of the enumerator at each step. 


the selected areas, the upplication of definitions, the handing of border- 
i d 


It is also necessary to watch carefully the work of the enumera- 
tors. The mere presence of supervisors іп the field has a wholesome 
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effect on their performance. The supervision should be carried out by 
superior staff, better paid, better qualified, and more experienced. From 
time to time the supervisory staff should itself undertake fieldwork in 
order to appreciate the difficulties involved. 


4. Control on the quality of the fieldwork and the field Edit. 
Steps must be taken to ensure that the survey is under statistical 
control; that the errors to which itis subject are random and no assign- 
able causes of variation are present. А system of field checks by the 
supervisors should also be introduced. The field checks should prefer- 
ably be carried out on a random sub-sample of units, and should be 
conducted in such a manner that investigators do not have prior knowledge 
of the work going to be checked. lfitis found that the enumerator 
is not honest and is not following the instructions, all his returns should 
be reviewed and he should be removed from the field. 


After the work of collection of data is complete and the question- 
naires or schedules are handed over by the enumerator to the supervisor 
white in the field, the supervisor should scrutinize these to check omissions, 
inconsistencies, illegible writIng and other errors before they are passed 
onto the headquarters. This editing is highly useful because of several 
reasons. Firstly, unless the questionnaires are edited on the spot, the need 
for further information to correct some of the wrong entries may only ` 
be discovered when the team has moved to another area ; and Secondly, if 
the errors are discovered at this stage, the enumerator can be instructed 
not to make such errors in the future. Also most of the obvious errors 
can be corrected at this stage without making a reference to the respon- 
dent since the intecview is still fresh in the mind of the enumerator. 


5. Follow up of non-response. In spite of best efforts, some ofthe res- 
pondents may not co-operate. A suitable machinery for dealing with those 
cases from whom the required information couid not be obtained due to lack 
of response should be set up. One method of dealing with the nonrespense 
problem is to make a list of the non-respondents and take a small sube 
sample of them. Then with the help of supervisory staff vigorous efforts 
can be made for securing response. 

It is important to see that enumerators are not allowed to make 
substitutions for those not found. If this practice is followed the enu- 
merators will rot take pains to persuade the non-respondent to co-operate 
and there will be a tendency to substitute for any one who is not considered 
to be a good respondent which will introduce bias in the survey results. 


6. Processing of the Data. After data have been collected the scene 
shifts from the field to the office. The data are to be given a thorough check, 
coded, transferred to cards or tape and tabulated. These operations are 
in no way less important than the collection of data. There are chances 
of errors arising at every step and hence one has to be cautious, While 
editing it is necessary to see that the „questionnaires are complete in every 
respect and the information supplied is consistent and accurate. 

The responses in the edited questionnaires may be coded. The pro- 
cess of coding involves translating responses in numerical terms in order 
to facilitate the analysis. For this a list of codes is set up. For example, 
the sex of the respondents may be coded as : male—1 female—2. After the 
material is edited and coded, it is ready for analysis which can be per- 
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formed either 5y hand or by machines. If machines are available the 
coded information is transferred to the punched cards before actual 
calculation begins. The operator punches holes in the cards at the 
appropriate places as he runs his eye over the codes in the questionnaire. 
The punched cards are then verified. 

A great deal of survey work these days is tabulated by computers. 
Computers not only save. time but also make it possible to study large 
number of variables affecting a problem simultaneously. By means of | 
such techniques as computer simulation, it is possible to simulate and ana- 
lyse the operation of extremely complex systems, which cannot be studied 
economically by other means. 

7. Preparation of Report. After the data have been collected, and 
analysed it is usually necessary to embody the results of the Survey in the 
form of a report. The preparation of report, therefore, constitutes the 
final step in execution. Two kinds of reports may be presented : either a 
general report giving a description of the survey for the use of those who 
are primarily interested in the results, or a technical report giving details of 
the sample design, computational procedures, accuracy and allied. aspects. 
In this connection, it will be useful to follow the recommendations made 
by the United Nations Statistical Office on the subject of the preparation 
of reports. These recommendations both for general report as well as 
technical report have been briefly described below : 

In a general report the following aspects of the survey should be 
highlighted : 

(i) Statement of the purpose of Survey. A general indication should 
be given of the objects of the survey and the permissible margin of error 
and the ways in which it is expected that the resu!ts will be utilized, 

(ii) Description of the coverage. An exact description should be given 
of the geographic region or branch of an economy or social group or other 
categories ot constituent parts of a population covered by the survey. 

(iii) Collectien of information. The nature of the information collec- 
ted should be reported in considerable detail, including a statement of 
items of information collected but not reported upon. The method of 
collecting data should be reported together with the nature of steps taken 
to ensure that the information is as complete as possible. The extent and 
causes of nonresponse, etc., should be stated. It is also desirable to repro- 
duce in the report the copies of the questionnaire or other schedules used 
in the survey. 

(ir) Numerical results. A general indication should be given of the 
methods followed in the derivation of the numerical results. Particulars 
should be given of methods of weighting and of any supplementary infor- 
mation utilized, for example, to obtain ratio estimates. 

(у) Accuracy attained. A general indication of the accuracy attained 
should be given and a distinction should be made betwcen sampling and 
non-sampling errors. 

(vi) Miscellaneous considerations. It is also important to touch upon 
such aspects as : the period to which the data refer and the time taken for 
the field work ; whether the survey is an isolated one or is one of a series 
of similar surveys; an indication of the „cost of the survey under such 
headings as preliminary work, fiel¢ investigations, analysis etc., the extent 
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to which the objects of the survey were fulfilled. It is also desirable to 
give reference of any available reports, papers or other publications rela- 
ting to the survey. 

As regards technical report, the following aspects should be clearly 
brought to light : 

(i) Specification of the frame. A detailed account of the specifica- 
tion of the frame should be given providing such information as the geo- 
graphic areas and categories of material included and the date and source 
of the frame. 

(ii) Design of the survey. The sampling design should be carefully 
specified including details such as types of sampling units, particulars of 
stratification, etc. 

(iii) Personnel and Equipment. It is desirable to give an account of 
the organisation of the personnel employed in collecting, processing and 
tabulating data. Arrangements for training, inspection, and supervision 
of the staff should be explained. 

(iv) Statistical analysis and computational procedure. The statistical 
method followed in the compilation of the final summary tables from the 
primary data should be described. It is also desirable to reproduce the 
formulae used. 

(v) Comparisons with other sources of information. Every reasonable 
effort should be made to provide comparisons with other independent 
sources of information. Such comparisons should be reported along with 
the other results, and the significant differences be discussed. The 
object of this is not to throw light on the sampling error, since а well- 
designed survey provides adequate internal estimates of such errors, but 
rather to gain knowledge of biases and other non-random errors. 

(vi) Observations of Technicians. The critical observations of techni- 
cians in regard to the survey, or any part of it, should be given. These 
observations will help others to improve their operations. 

It should be noted that a survey demands utmost care at each phase 
of the activity: poor work in one phase may ruin a survey in which 
everything else is done well. 


SUGGESTED READINGS 
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Yates Sampling Methods for Censeses and Surveys, 
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3 Sources of data 


Once the purpose of survey has been clearly defined, the next prob- 
lem is that of obtaining the necessary data from suitable sources. There 
are two alternatives here : / 

l. Either an original investigation’ may be undertaken, or 

2. Data gathered by someone else may be utilised. 

Data originally collected for an invéstigation are known as primary 
data whereas those obtained from published or unpublished sources are 
known as secondary data. The secondary data constitute the chief 
material on the basis of which statistical work is carried out in many 
investigations. It should be noted that it is the process of assembling 
primary data which is called ‘collection’ of statistics and is different from 
the process of ‘compiling’ statistics (i.e., secondary data) from various 
published sources. To quote Crum, Patton and Tebbutt, “Collection 
means the assembling, for the purpose of a particular investigation, of 
entirely new data, presumably not already available in published sources."* 
We have used the term ‘collection’ in this book strictly in the narrow sense 
defined above. 

Difference between Primary and Secondary Data 

The differe.ce between primary and secondary data is a matter of 
relativity : data which are primary in the hands of one become secondary in 
the hands of another. Data are primary for the individual agency or institu- 

— tion collecting them whereas for the rest of the world they are secondary. 
A few examples would clarify the distinctlon. Suppose, ап investigator 
wants data about the spending habits of the students of Delhi University. 
Ifhe collects the data himself or through his agents adopting any suitable 
method such as contacting and interviewing students or circulating a 
questionnaire, the data would constitute primary data for him. On the 
Otherhand, if the students union has already made a similar survey and 
the investigator obtains data from union office, such data would constitute 
secondary data for him. Similarly, statistics collected by various depart- 
ments of the Government such as Labour Bureau and Central Statistical 
Organisation are primary for the respective departments whereas for all 
others they constitute secondary data. 3 
Advantages and Limitations of Secondary Data 

Use of secondary data offers the following advantages : 

(1) It is much cheaper to use information which someone else has 
compiled. There is no need for printing data collection forms, hiring 
enumerators, editing and tabulating thé results etc. 


* Crum, Patton and Tebbutt : Economic Statistics, 
Мс Grow-Hill Book Company, New York. 
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(2) If secondary data are available they are much quicker to obtain 
than primary data, A field project frequently takes 3-4 months to com- 
plete, while secondary data can collected in a few days. 

. .. (3) Secondary data аге available on some subjects where it would Бе 
impossible to collect primary data. For exampie, census data cannot be 
collected by an individual or research organisation, but can be obtained 
from Government publications, 
E However, two major problems are encountered in using secondary 
ata : 
: (1) The first is the difficulty of finding data which exactly fit the need 
of the present project. 
(2) The second problem is finding data which are sufficiently 
accurate. 
Choice between Primary and Secondary Data 
The investigator must decide at the outset whether he will use pri- 
mary data or secondary data in an investigation. The choice between the 
two depends on the following considerations : 
Nature and scope of the enquiry, 
Availability of financial resources, 
Availability of time, 
Degree of accuracy desired, and 
The status of the investigator, i.e., individual, corporation 
Government, etc. 

It may be pointed out that most statistical analysis rests upon secon 
dary data. Primary data are generally used in those cases where the 
secondary data do not provide an adequate basis for analysis. In certain 
cases, both primary as well as secondary data may be employed. , The 
reason why secondary data are being increasingly used is that published 
statistics are now available covering diverse fields so that an investigator 
finds required data readily available to him in many cases. 

METHODS OF COLLECTING PRIMARY DATA 
Primary data may be obtained by applying any of the following 
methods : 
I. Direct personal interviews. 
П. Indirect oral interviews. 
III. Information from correspondents. 
ТУ. Mailed questionnaire method. 
‚ V. Schedules sent through enumerators. 

These methods are discussed below : 

I. Direct Personal Interviews 
Under this method of collecting data, the investigator personally 
jn contact with the persons from whom the information is to be 
obtained (known as informants). He asks them questions pertaining to 
the enquiry and collects the desired information. Thus, if a person wants 
to collect data about the wages of workers of the Delhi Cloth Mills, he 
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1 
would to go to the Mill, contact the workers and obtain the réquired 
information. The information thus obtained is first-hand or original in 
character. 


Merits. The advantages of personal interviews are : 


l, Response is more encouraging as most People’ are willing to 
supply information when approached personally, 


2. The information obtained by this method is likely to be more 
accurate because the interviewer can clear up doubts of the informants 
about certain questions and thus obtain correct information. In case the 

j interviewer apprehends that the informant is not giving accurate informa- 
tions, he may cross-examine him and thereby try to obtain the correct 
information. 


3. It is also possible through personal interview to collect supple- 
mentary information about. the informant's personal characteristics and 
environment and such information often proves very useful while inter: 
preting results. · 


4. Questions about which the informant is likely (о be sensitive can 
be carefully sandwitched between other questions by the interviewer Не 
can twist the questions keeping in mind the informants' reactions. He can 


In other words, a delicate Situation can usually be handled more effectively 
by a personel interview than by other survey techniques. 


5. The language of communication can be adopted to the status and 
educational level of the person interviewed, thus avoiding inconvenience 


and misinterpretation on the Part of the informants. 
Limitations, Some limitations of the personal interview methods 


1. It may be Very costly where the number of persons to be inter- 
. viewed is large and they are spread over a wide area. 


-_ 2. The chances of Personal prejudice and bias are greater under 
this method as compared.to otKer methods. 


. , 9. The interviewers have to be throughly trained and supervised, 
otherwise they may not be able to obtain correct information. Untrained 
or poorly trained people may spoil the entire work. _ 


^ ^ 4. More time is required for collecting information under this 
method as compared to others. This is because interviews can be held 
only at the convenience of the informants. Thus, if information js re- 


Suitability. This method is suitable for intensive rather than exten- 
sive field surveys. Hence, it Should be used only in those cases where 
an intensive study of a limited field is desired. 
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П. Indirect oral interviews 

Under this method of collecting data, the investigator contacts third 
parties or witnesses capable of supplying the necessary information. This 
method is generally adopted in those cases where the information to be 
obtained is of a complex nature and the informants are not inclined to res- 
pond if approached directly. For example, їп an enquiry regarding 
addiction to alcoholic drinks, people may be reluctant to supply 
information about their own drinking habits. It would be necessary, in that 
get the desired information from dealers of liquour or other people who 
case to may be knowing them, for example, their neighbours, friends ,etc. 
Similarly, if a fire has broken out at a certain place the cause of the 
fite may be traced by contacting persons living in the neighbourhood 
of that area. In a similar manner, clues about thefts or murders are 
obtained by the police by interrogating third parties who are supposed to 
have knowledge about the problem under investigation. Enquiry com- 
mittees and commissions appointed by the Government generally adopt 
this method to get people's views and all possible details of facts relating 
to the enquiry. 

This method is very popular in practice. However, the correctness of 
information obtained depends upon a number of factors, such as : 


1. The type of persons’ whose evidence is being recorded. If the 
people do not know the full facts of the problem under investigation or 
if they аге prejudiced it will not be possible to arrive at correct 
conglusions. 

2. The ability of the interviewers to draw out the information from 
witnesses by means of appropriate questions and cross-examination. 

3, The honesty of the interviewers who are collecting the informa- 
tion. It might happen that because of bribery, nepotism or certain other 
reasons those who are collecting the information give it such a twist that 
correct conclusions are not arrived at. 

For the success of this method it- is necessary that the evidence of 
one person alone is not relied upon ; the -views of a number of persons 
should be ascertained to find the real position. Utmost care must be 
exercised in the selection of these persons, because it is on their views 
that the final conclusions are reached. 

Suitability. This method is suitable in such cases where indirect 
sources of information are required to be tapped either because direct 
sources do not exist or cannot be relied upon or would be reluctance to 
part with the information. : 


IIl. Information from Correspondents 

Under this method, the investigator appoints localagents or corres- 
'pondenis in different places to collect information. These correspondents 
collect and transmit the information to the central office where the data 
are processed. Newspaper agencies generally adopt this method. Corres- 
pondents in different places supply information relating to such events as 
accidents, riots, strikes, etc., to the head office. The correspondents may 
be paid or honorary persons but generally they are paid. This method is 
also adopted by various departments of the government in such cases 
where regular information is to be collected from a wide area. For example, 
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investigations. However, it may not ensure very accurate resuits because 
of the personal prejudice and bias of the correspondents. ; 
Suitability. As stated above, this method is generally adopted in 
those cases where the information is to be obtained at regular intervals. 
from a wide area. 
IV. Questionnaire sent by Post 
Under this method, a list of questions pertaining to the enquiry' 
own as a questionnaire) is prepared and sent to the various informants 
by post. The questionnaire contains questions and provides Space for 


Merits. 1. This method of collecting data can be easily adopted 
where the field of investigation is Very vast and the informants àre spread: 
over a wide geographical area, 

2. It is also relatively cheap and expeditious provided the informants 


respond in time, 

Limitations. 1. This method can be adopted only where the infor- 
mants are literate people so that they can understand written questions 
and send the answers in writing. i 

2. Itinvolves some uncertainty about the response. -Co-operation 
on the part of informants may be difficult to presume. 

3. The information supplied by the informants may not be correct 
and it may be difficult to verify the accuracy. 

The success of this method depends upon the skill with which the 
questionnaire is drafted and the extent {о which willing Co-operation of 
the informants is secured. Since the advantages of the personal contact 
are lost in the mailed questionnaire, thé form and tone of the questionnaire: 
must be designed to Supply as far as possible the missing personal element.. 
Where the information is required by a government department, it is gene- 
rally available on account of legal or administrative sanctions. In other 
cases, it is necessary to take informants into confidence so that they 
furnish correct information. 


To make this method work effectively the following suggestions are 
le: 


1. The questionnaire should be so framed that it does not become 
an шш burden on the respondents otherwise they may not return them 
ack. 

2.' Prepaid postage stamps should be affixed. 

3. The sample should be large. 

4. Itshould be adopted in such enquiries where it is expected that 
the responders would return the questionnaire because of their own 
interes: in the equality, — 

5. Its use should be Preferred in such enquiries where there could 
be a legal compulsion to supply the information so that the risk of non- 
response is eliminated. \ 
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Suitability. This method is appropriate in cases where informants 
are spread over a wide area, i.e., in case of extensive surveys. 


V; Schedáles* sent through Enumerators 
Yet another method of collecting information is that of sending 
Schedules through the enumerators or interviewers. The enumerators 
contact the informants, get replies to the questions contained in a schedule 
and fili them in their own handwriting in the questionnaire forms. The 
essential difference between the mailed questionnaire method and this 
, Method is that whereas in the former the questionnaire is sent to the 
informants by post, in the latter the enumerators carry the schedule per- 
sonally to the informants. This method is free from some of the limita- 
tions of the mailed questionnaire method. 
Merits, The main advantages of the method аге: / 
1. It can be adopted in those cases where informants are illiterate. 
2. There is very little non-response as the enumerators go personally 
to obtain the information. 
3. The information received is more reliable as the accuracy of 
Statements can be checked by supplementary questions wherever necessary. 

Limitations. 1. Amongst the various methods of collecting primary 
data, this method is quite costly as enumerators are generally paid 
persons. 
$ 2. The success of the method depends largely upon the training 
imparted to the enumerators. 

' 3. Skilled interviewing requires experience and training, but there 
18 а tendency for statisticians to neglect this extremely important part of 
the-data collecting process. Without good interviewing most of the 
information collected is of doubtful value. 

4. The way in which the enumerators conduct the interview would 
affect the data collected. When questions are asked by a number of 
different interviewers it is possible that variations in the personalities of 
the interviewers will cause variation in the answers obtained. This varia- 
tion will not be obvious. Hence every effort must be made to remove as 
much of the variation as possible due to the different interviewers. 

Suitability. This method is quite popularly used in practice. The 
main reason for this is a very high rate of response because of the 
personal contact of the enumerators. 


Drafting the Questionnaire 

Before framing the questionnaire it is essential to set out in detail 
the ideal data which we desire from the answers to the questionnaire. It 
shall be wise if we can construct the sort of tables which we would like to 
emerge from the enquiry. It is not always possible to set out all the ideal 
data we should like in advance since many things we may learn in the 
course of enquiry and may thus find that what we believed'to be ideal was 
not in fact ideal. For this reason those who are likely to be concerned 
with analysing the results should be called in at the very earliest stage. 


* A distinction is often made between a questionnaire and a schedule. les- 
tionnaire refers to a device for securing answers to questions by using a form which 
the respondent fills in himself. Schedule is the name usually applied toa set of 
questions which are asked and filled in a face-to-face situation with anotber person, 
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For example, it may not be very appropriate for a government statistician 
to collect some data on, say, unemployment and then hand them over 
to an economist to analyse. The wise thing would have been to consult 
the economist first on what data were desirable. 


The success of the questionnaire method of collecting information 
depends largely on the proper drafting of the questionnaire. Drafting 
questionnaire is a highly specialised job and requires a great deal ‘of skill 
and experience. It is difficult to lay down any hard and fast rules to be 
followed in this connection. However, the following general principles 
may be helpful in framing a questionnaire : 


1. Covering letter. The person conducting the survey must іпіго- 
duce himself and state the objective of the survey. It is desirable that— 


(i) A short letter is enclosed. The letter should state in as few a 
words as possible the purpose of the survey and how the informant would 
tend to benefit from it. 


.. (ii) Enclose a self-addressed envelope for the respondent’s conve- d 
nience in returning the questionnaire. 


(iii) Assure the respondent that his answers will be kept in strictest 
confidence. 


(iv) Promise the respondent that he will not be solicited after he fills 
up the questionnaire. уі 

(v) If possible, offer special inducements (free gifts, concession. 
Coupons, etc.) to return the questionnaire. 


(vi) Ifthe respondent is interested, promise a copy of the results of 
the survey to him. 


2. Number of questions should be small. The number of questions) 
should be kept to the minimum. The precise number of questions to , be 
included would naturally depend on the object and scope of the investi- 

tion. Fifteen to twenty-five may be regarded as a fair number. Ifa 
lengthy questionnaire is unavoidable, it should preferably be divided into 
two or more parts. 


3. Questions should be arranged logically. The questions must be 
arranged in a logical order so that a natural and spontaneous reply to 
each is induced. They should not skip back and forth from one topic to. 
another. Thus it is undesirable to ask a man how many children he has 
before asking whether he is married or not. Similarly, it would be illogical 
to ask a man his income before asking him whether he is employed or not. 
Thus the sequence of the questions should be considered carefully in terms 
of the purpose of the study and the persons who will supply the infor- 
mation. Questions supplying identification and description of the respon- — 
dent should come first followed by major information questions. 1 
opinions are requested, such questions should usually be placed at the end .— 
of the list. Two different questions worded differently may be included on 
the same subject to provide cross-check on important points. 


4. Questions should be short, and simple to understand. The ques- 
tions should be short and simple to understand. Unless the person being 
interviewed is technically trained, technical terms should be avoided. Words 
such as "capital" or "income" that have different meanings for different ae 
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persons, should not bé used unless a clarification is included in tne 
question. \ 


.. Ambiguous questions ought to be avoided. Ambiguous questions mean 
different things to different people. It will not be possible to obtain com- 
parable replies from respondents who take a question to mean different 
things. For example, a question asked from those who visit Super Bazar 
is ‘Do you think the salesmen at different counters need training’. Such а 
question would be ambiguous because of differences of opinion as to what 
constitutes training—does it mean formal programme of classes extending 
over a pericd of few days, weeks, or months or would a few days of work- 
ing with highly experienced salesmen constitute training, etc. ? 

5. Personal questions should be avoided. As far as possible, ques- 
tions of a personal and pecuniary nature should not be asked. For exam- 
ple, questions about sources of income, volume of sales, etc. may not be 
willingly answered in writing. Where such information is essential, it 
should be obtained by personal interviews. Even then, such questions 
should be asked only at the end of the interview, when the informants feel 
more at ease with the interviewer. 


6. Instructions to the informants. The questionnaire should provide 
necessary instructions to the informants. For example, the questionnaire 
should specify the time within which it should be sent back and the place 
where it should be sent. Instructions about unit of measurement, etc., 
should also be given. For instance, if there is a question on weight, it 
should be specified as to whether weight is to be expressed in pounds or 
kilograms or some other unit. 

7. Questions should be capable of objective answer. Avoid questions 
of opinion and keep to questions of fact. In factual studies, it is highly 
desirable that questions are so designed that objective answers may be 
forthcoming. For example, instead of asking the condition of a building 
and allowing the informant or enumerator to state the condition in his 
own words, it is desirable to ask if a structure was in good condition, 
needed minor repairs, needed structural repairs or was unfit for use. No 
doubt, answers to such questions may not be completely objective but 
they can be readily tabulated. Similarly, while asking students how do 
they normally travel to college, frame a question of the type : 


How do you normally travel to college ? 
(i) By bus ` 
(il) By your own car 
(iii) By your own scooter 
(iv) By taxi 
(у) By three-wheeler scooter 
(vi) On foot 
(vii) Any other 
The respondent will tick mark the particular alternative applicable to him. 
This type of question is known as multiple-choice question. Yt sug- 
gests several answers among which the respondent may choose. If a 
multiple choice question is used, all alternatives should be stated and a 
*don't know category be left in the questionnaire. Such questions not only 
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facilitate tabulation but will take very little time of the respondent to fill 
‘the questionnaire. However, this type of question is excellent if most of 
the possible answers are both known and few in number. When the 
possible answers are numerous, a limited list—even if accompanied by an 
“other” category—may elicit a response different from that which other- 
wise would be forthcoming. Multiple choice questions tend to bias results 
by the order in which alternative answers are given. When ideas are 
involved, the first item in the list of alternatives has a favourable bias. 
The use of multiple-choice questions is indicated only when the investiga- 
tor is confident of the existence of a limited group of important alternatives 
and itshould be avoided when there are many possible responses of 
relatively equal significance. 


8. “Yes” or “No” questions. As far as possible the questions should 
be of such a nature that they can be answered easily in ‘Yes’ or ‘No’. 
Such questions pose a simple alternative to the respondent. This is an 
excellent technique if applied to situations where a clear-cut alternative 
exists. The questions “Do you own a car ?”, Are you married ?", “Did 
you vote in last election ?” can easily be answered with a “уез” or “по”. 
However, when the alternative is not clear-cut, the “yes or no” type 
question should be avoided. A question such as “Are you in favour of 
the policies of Congress party ?" usually cannot be answerd with a simple 
reply. The Congress party has so many policies and only the most radical 
partisan would favour or oppose them all. A typical citizen may endorse ~ 
many, have no opinion on some, and reject others. The “уез”. or “по” 
question in this case compels him to compressa variety of opinions into . 
a simple alternative which mày, in reality, not exist. 


Sometimes a respondent cannot give a simple “yes” or “по” answer 
either because he has not yet made up his mind or because he lacks in- 
formation on the topic. For example, the answer to the question “Are 
you in favour of public schools" may not always be in ‘yes’ or ‘no’ be- 
cause the respondent has not, thought over it. In such cases additional 
сата Such as ‘do not know’, ‘undecided’, ‘no opinion’ should be | 

luded. ! 


_ 9. Specific information questions and open questions. Specific informa- 
tion questions call for a specific item of information. For example, “what is - 
your age ?", “How many children do you have ?", etc. These questions — 
are simple and direct and are well adapted to securing information of 
this type. Саге should be taken to use this type of question ойу where 
the respondent can answer and will answer correctly. The open question 
does not pose alternatives or request specific information. It leaves the 
respondent free to make whatever reply he chooses. For example, the 
question "please suggest measures to enhance the practical utility of 
B.Com. course ?” is a question of this type. In many ways open question | 
is superior to other types—there is no danger of being unduly restrictive, — 
suggesting answers, posing false alternatives, and introducing some bias. 
It also may serve to interest the respondent in the interview itself, 
especially if he is asked his opinion at the outset. However, open 
questions are difficult to tabulate: Since no restriction is placed upon the 
variety of answers, many will often be forthcoming. This not only | 

increases the labour involved but trequently leads to improper tabulation. “ 
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Hence every effort must be made to minimize open questions. in the 
questionnaire. 

10. Questionnaire 'should look attractive. Some care should be 
taken in the actual setting out of the questionnaire. It should be made 
to look as attractive as possible. Plenty of space should be left for 
answers depending upon the type of questions. 

11. Questions requiring calculations should be avoided. Questions 
should not require calculations to be made. For example, informants 

hould not be asked yearly income, for in most cases they are paid monthly. 

imilarly, questions necessitating calculation of ratios and percent- 
ages, etc., should not be asked as it may take much time and the infor- 
maht may not send back the questionnaire at all. 

12. Pre-testing the questionnaire. The questionnaire should be 
pre-tested with a group before mailing it out. The advantage of pre- 
testing is that the shortcomings of the questionnaire can be discovered 
and it can‘be revised in the light of the tryout. 

. 13. Cross-checks. ЇЇ possible, one or more cross-checks should 
be incorporated into the questionnaire to determine whether the respon- 
dent is answering the questions conscientiously. 

14. Method of tabulation. The method to be used for tabulating 
the results should be determined before the final draft of the question" 
naire is made. Ifthe results of the questionnaire are to be comput- 
ярий, Eod itis desirable to consult the computer experts before making 
a final draft. 


Pre-testing the Questionnaire 

Before final form of the questionnaire is adopted it is desirable to 
carry out a preliminary experiment on à sample basis, When question- 
naires are to be distributed on а large scale, it is absolutely essential to test 
the иш, There are many advantages of pre-testing the questionnaire, 
such as : 

1. The investigator can find out what are the drawbacks of the 
questionnaire, ie. which questions ought to be deleted and which more 


ought to be added. 
2. Anidea can be formed about the extent of non-response likely 


to take place. 
4. Greater co-operation of the informants can be secured. Even 


persons most allergic to writing can with proper inducement be prevailed 
upon to answer questionnaire. It is the survcyor's job to find out what 
these appeals are. 

aire, it is important always to cover 


While pre-testing the questionn 
a cross-section of the population eventually to be surveyed. When the 
sample is drawn, it should be broken down into various sub-samples by 
taking, for instance, every tenth or every hundredth. case from the 


entire list. 

The work of pre-testing the questionnaire must be done with utmost 
care and caution otherwise and unwanted changes may be 
introduced. Proper testing, revising and re-testing questionnaire would 
yield high dividends. 
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Specimen Questionnaires 


The following two specimen questionnaires incorporate most of the 


qualities of a good questionnaire, The first questionnaire was designed to 
find out how far M.B.A, Programme helps the working executives in deve- 
loping skills. The second questionnaire relates to Super Bazar. It aimed at 
evolving better ways of Providing shopping facilities to the consumers. 


QUESTIONNAIRE NO. 1* 
“THE DEVELOPMENT OF MANAGERIAL SKILLS AT 
THE INSTITUTIONAL LEVEL” 


This study partly financed by the U.G.C, has been undertaken with the 
following three Objectives : 


(1) To find out how far M.B.A. programme helps the working executives in 
developing skills. 


Jet, (2) To find out whether an M.B.A. is betier than a non-M.B.A. on the same 


(3) To find out in what ways M.B.A. programme can be made more useful 
from the practical point of view, 


The. work is purely of an academic interest and as such YOUR. IDENTITY 
WILL NOT BE REVEALED AND THE INFORMATION SUPPLIED WiLL 
BE KEPT STRICTLY CONFIDENTIAL. The findings will be used for a Ph.D. 
thesis to be submitted to the Faculty of Management Studies, University of Delhi. 
You are requested to make your valuable contribution to this project by 
supplying the information asked in the questionnaire, 


T2 Deptt. ofc Commerce, 
S.R. College of Commerce, 
Delhi University, Delhi. 


\ Questionnaire 
PART I 


Background Information 


оа) Age [уе 
(2) Sex : Male [] ., Female [] 
(3) Marital Status : і 
4 Married Unmarried. 
Any other 
(4) Number of Dependants : (а) Children 
(5) Religion : 
Hindu 
Muslim 
Sikh 


(b) Others 


——— 


DOO 


Christian 
(6) if Married : 
(a) Educational Qualifications of wife : 
(i) Matric/ Higher Secondary е 
Gi) Graduate. 
(iii) Post-graduate. 
(iv) Any special training/course (specify) 
(b) Is your wife employed Yes/No ? 
(c) If yes. specify the nature of her job. ч 


` * This was one of the questionnaires used by the author of this text for collecting 
data, 


Boug 


ГА 


| 
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(7) (a) Father's Occupation and Designation 
(b) Father's approximate monthly income Rs. 
(present or last drawn) 


PART П 


Education and Employment 


(1) (а) Educational Qualifications : 


Examination passed - 
From Higher Secondary Year Division 
or equivalent onwards 


(Б) Mode of School Education 
(i) Government School D 
(ii) Public School L A 
(iii) Any other (specify) Г] 


(2) Past and Present Employment 


Organisation Designation Nature of Period of Total monthly 
job Employment ^ income 
including allowances 
ус Ылан на ДА 


(3) (a) Have you changed your organisation after/during M.B.A. ?- 
(b) If yes, please specify reasons for the change. 
(c) 1f not, what was your status before joining and after passing M.B.A. ? 


Before joining After passing M.B.A. 
М.В.А. 1 уеаг 2years 3years 4years 5 years 


(i) Total monthly income 
(ii) Designation 


iii) Number of subordinates 
ш) excluding class IV і 
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а). Do you find t - 
ie rs Ф. К до з ou Me progai à poatea in your ofganisation for implement 
W To some extent o 
(йу Not at al oD 
(ii) To a very large extent [ш] 
(b) If not; why (Tick whichever applicable) ? 
[0] me set Яссы арай is not conversant with the modern mansgement 
(in) Tie top management does not appreciate modern management 
techniques 


(Шу Government policy is not conducive to such changes 
(v) Period of employment іп the organisation is insufficient 
(v) The organisation is too small 
(vf) The nature of duties assigned 

(vii) The environment is not congenial 

(vill) Any other (specify) 


paoaagoog 


PART Ш 
Objective of joining M.B.A. and Из attainment 
1. Who inspired you to join M.B.A. ? 
(/) Your boss n 
(il) Your friend who has done M.B.A. D 
(iii) Self-inspiration D 
(Ul) Any other (specify) a 
2. po тонго your чоет of joining M.B.A. ? 
(0 To 0 job prospects in the present organisation 
` (6i) To shift to a better job 
(iii) To improve job performance 
(iv) To acquire knowledge of latest management techniques 
(P) To enhance professional image 
(эй) To develop coatacts with people 
(vif) To ereate opportunity to go abroad 
lvii) To set up a new business 
(x) To run your business more scientificaily 
(x) Any other (specify) 


agaaoaaaoaogo 


Objective/Objectives Substantially Moderately Not at all 
TY) ecu eee 
| E 

4. Which of the following benefits you lave derived because of M.B.A. ? 
(a) Promotion: Yes Мо 
(i) with increase in salary oo oO 


(ii) without increase in salary oo oo 


ee tom 
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(°) Special increments 


(c) Cash award 

(4) Merit bonus 

(€) Job satisfaction 
(S) Recognition 

(g) Prestige 


(А) Increase in authority 
(0 Any other (specify) 
5. How do you place yourself with regard to the follo-~viag skills before 


after M.B.A. 


. Accel 


2. Ability to make decisions 


3. Ability to understand people 
4, Ability to view the enterprise 
aa vios 


N х 


Ability to build one's own 
se 


"д 


, Aaalytical ability 
. Appreciating human values 
. Co-ordinating 
^30. Controlling - 
‚ Creativity 


712, Developing subordinates 


. Delegating authority 


. Getting along with others 
. Handling grievances 

|. Initiative 

. Maintaining Objectivity 


! 


EHE 


met ы Г. 
на 


|LIDCISURERNSEHNERNEEBERT 


ч 
т 
+ 
= 
~ 
< 
© 
т 
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24. Recognising one's own 
strengths and wea 


25. Technical competence 


26. Using workingtime .- 
effectively ‘ 


27. Working in a team 


*To induce people to do their best for the Organisation, 

VP=Very Poor, P=Poor, AV Average, G= Good, E=Excellent 

PARTIV 
Miscellaneous Questions 
1(@) Have you attended any executive development programme ? Yes/No 
(6) If yes, please specify ч 

ЕЕЕ S Da 
Мате оѓ ће Organised by Duration Place where held Expenses borne by 
Programme Self/Organisation 
DURET ZTF CIV аата m er e e Rd LER RR 


N (c) Did these programmes help you in the development of skills ? 
ү Yes/No 


-—- (4) If yes, name the skills you developed most, 
i 

BB WR p 
( 


2. Which particular atea/areas of management helped you most іп developing 
your management skills (rank in order of importance) ? 
(i) General Management 
(ii) Marketing Management 
(iii) Personnel Management Я 
ЧУ) Production Management 
. 0) Financial Management 
. (vi) Behavioural Sciences 
(vii) Quantitative Methods 
(viii) Any other (specify) = 


3, Do you use Quantitative Methods іп decision-making ? Yes/No. 


If yes, pl tick the Methods th : . 
oF fe p! ше M ri iat you normally use п " 
(ii) Inventory Control Models ae CET 


00 


oo00000 


‘SOURCES OF DATA E-3:16 


(iii) Network Analysis 

(iv) Business Games and Simulation 
(») Statistical Techniques ; 
(vi) Any other (specify) 


4. Membership of Professional Body/ Organisation. 
ear aere 


оооп 


Name of the Body/Organisation Nature of your Association 
1 2 = 
2. — © 
3 — EY 
4 zen 5 


5. Would you suggest your other friends also to join М.В.А, ? Yes/No. 
Give reasons for your answer. 


6, What did you feel with regard to the following as a student of M.B.A. ? 


Not Satisfactory Satisfactory Excellent 
(i) Quality of Teaching — E үз 
(ii) Methods of Teaching — = AE 
(iii) System of Examination — — ix 
(iv) Library Facilities — = WA 
(9) Course Contents . — EU Bs 


ee — 


7. In what ways do you think the practical utility of the M.B.A, programme 
can be enhanced ? 
@ The duration of the course should be 1 year/2 years/3 years/4 years. 
* (i) The examination should be held on the basis of trimester/semester ` 
annual system. 
(iii) Greater emphasis should be given to (rank in осе of importance). 


(a) Lectures from academicians. DO 
(b) Lectures from industrialists. d 50 
(с) Case discussions. ў DO 
(d) Seminars. 50 
(е) Project report. "D0 
(f) Group discussions/Assignments. DUI 
(g) Individual Assignment. DEI 
(h) Industrial Tours. ` DU 
(i) Business Games. DO 
(j) Role Playing. Du 


(k) Others (specify). I Du 
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(iv) ораси of marks for internal assessment іп each paper should. 


nil 30% 
10% 40% 
15% 50% 
25% Above 50% 
(#) What percentage of Lectures should be made compulsory to attend ? 
nil 75% 
25% 809% 
60% Above 80% 


(vt) What changes should be made in M.B.A. programme so as to make it 
more useful from practical point of view ? 


(уй) Any other (specify). 


QUESTIONNAIRE NO. 2 
DEPARTMENT OF BUSINESS ADMINISTRATION 
Birla Institute of Techno! and Science 
Pilani (Rajasthan) - 
Dear Sir/Madam, October 11, 1967, 
We believe that are the spokesman for thousands of persons who visit 
the SUPER BAZAR in Delhi and since we are interested in knowing their prefer- 
ence we ask for your help. We believe that this survey would help in evolving 
better ways of providing shopping facilities to the consumers. Would you please, ' 
therefore, answer the enclosed questionnaire and return it by post? A self-add- 
ressed stamped envelope is attached herewith for the purpose. 
We assure you that we are іп no way connected with the SUPER BAZAR. 
The information supplied by you will be treated as confidential, 
ing you, 
Yours truly, 


Ра 


SUPER BAZAR 
QUESTIONNAIRE : CONSUMER PREFERENCE S 


Note : Please tick mark (У) in the square where necessary. 
A. General : неее 
Name : 
Address: , ` Age: 
Occupation е“ 
Service O Sex: Male [] Female Q 
Business H Married [) Unmarried J 
No. of members in the family Monthly income (in rupees) `~ 
io: From 3001000 О 
Ove 6 [3 From 1000-2000 = | 
: Above2,00 O 
* If you are getting exactly Rs. 500 put tick in second category 1e, 500-1000, if 


getting exactly ‘put tick in third category fe. 1 
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B. How many times do you visit Super Bazar ina month ? 


G. 


шо оо о 


m 


н ч 


15 0 

510 П 

A Over 10. [] 

Do you prefer any particular day for shopping ? 
Yes Е No D 
If yes, specify the day you prefer. 

Sun. [] Mon, Д Tues. 
Thrus. [ur i Г] Sat. 


O . Wed. 


Oo 


Do you buy most of your daily requirements from the Super Bazar ? 


Yes O No D 
Are you attracted to the Bazar because of : 
(i) reasonable price ? 
(ii) reliability of price ? 
(Ui) no bargaining ? 


1. Do you feel the profit-margin charged by the 


Super Bazar is reasonable ? 
Do you prefer Super Bazar because of : 


Yes 
Yes 
Yes 


Yes 


() availability of most of your requirements 


under one roof ? 
(i) certainty of getting goods ? 
(ii) wide selection and choice ? 
(iv) availability of goods in short supply ? 
(v) availability of quality goods ? 
(vi) surety about quality ? 
(vii) saving in shopping time ? 


Do you find the sales assistants in the Super Bazar : 
[Please tick mark (У) only one of the alternatives] 


(i) Attentive Г] Inattentive 
(ii) Courteous 0 Rude 
(iii) Cooperative [7] 
(іу) Efficient Г] Inefficient 


1, Do you visit the cafe in the Super Bazar while 


shopping ? 
2. Do you find any difficulty in locating the 
desired products ? 
3. Do you need parking facilities to make 
your visit easy ? 
Do you visit the Super Bazar because : 
(i) itis near your home ? 
(ii) it is on your way to home ? 
(iii) itis centrally located ? 
Are you shareholder in the Super Bazar ? 


1. Are you attracted by the shareholders’ 
discount ? 


. 2. Do you prefer sales girls to salesmen ? 


3. Are you satisfied with the packing provided 
by the Super Bazar ? 
Can you give your first, second and third 
preferences for visiting the Super Bazar 
out of the following ? 


SME—1077-4 


Unco-operative 


Yes 
Yes 
Yes 
Yes 


Yes 
Yes 


Yes 


Yes 
Yes 


Yes 


о ooo 


oo00000 


о оо оооа о о о oooo 


No 


No 
No 
No 
No 
No 
No 
No 


E-3'18 
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о 000 


o000000 
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(Please write numbers in the squares.) 

Quality Р LJ 

Availability of all goods under one roof D 

Price 0 

Service [ш] 

Location п 
m] 
o 


Time-saving 
Shareholding interest 


constantly in a state of production. The sources of secondary data can 
broadly be classified under two heads : 

(1) Published sources, and 

(2) Unpublished sources. 
1. Published Sources 

The various sources of published data are : 

l. Reports and official publications of— 

(а) International bodies such as the International Monetary Fund, 


International Finencial Corporation and United Nations 
Organisation. 


(b) Central and State Governments such as the Report of the 
Patel Committee and Mahalanobis Committee. 


2. Semi-official Publications of various local bodies such as 
Municipal Corporations and District Boards. 


3. Private publications—such as the publications of— 


(а) Trade and professional bodies, e.g., the Federation of Indian 


Chambers of Commerce and Institute of Chartered 
Accountants. 


(b) Financial and economic journals such as ‘Commerce’, 
"Capital" and ‘Indian Finance’, 


(c) Annual reports of joint stock companies, 


(d) Publications brought out by research Institutes ф bodies, 
research scholars, etc. 


It should be noted that the publications mentioned above vary with 
regard to the periodicity of publication. Some are Published at regular 
intervals (yearly, monthly, weekly, ete.) whereas others are ad hoc publi- 
cations, ie., with no regularity about Periodicity of publication. 


П. Unpublished Sources 


| 


| 


| 
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EDITING THE DATA 

Once data have been obtained either from primary or secondary 
sources, the next step in a statistical investigation is to edit the data, i.e, 
to scrutinize the same. The chief object of editing is to detect possible 
errors and irregularities. The task ofediting is а highly specialized one 
and requires great care and attention. Negligence in this respect may 
render useless the findings of an otherwise valuable study. However, it 
should be noted that the work of editing data collected from internal 
records and published sources is relatively simple—it is the data collected 
from a survey that needs extensive editing. 

Editing Primary Data 

While editing primary data the following considerations need 
attention : - 

1. The data should be complete, 

2. The data should be consistent, 

3. The data should be accurate, and 

4. The data should be homogeneous. 

1. Editing for completeness. The editor should see that each sche- 
dule and questionnaire is complete in all respects, i.e., answer to each and 
every question has been furnished. If some questions have not been 
answered and those questions are of vital importance the informants 
should be contacted again either personally or through correspondence. It 
may happen that in spite of best efforts a few questions remain unans- 
wered. In such questions, the editor should mark *No report' in the 
space provided for answers and if the questions are of vital importance 
then the schedule or questionnaire should be dropped. 

2. Editing for consistency. While editing the data for consistency, 
the editor should see that the answers to questions are not contradictory 
in nature. If there are mutually_contradictory answers, he should try to 
obtain the correct answers either by referring back the questionnaire or by 
contacting, wherever possible, the informant in person. For example, if 
amongst others, two questions in a questionnaire are : (a) Are you 
married ? (b) State the number of children you have, and the reply to the 
former question is ‘not’ and to the latter ‘three’, then- there is 
contradiction and it should be clarified. 

3. Editing for accuracy. The reliability of conclusions depends 
basically on the correctness of information. If the information supplied is 
wrong, conclusions can never be valid. It is, therefore, necessary for the 
editor to see that the jnform:tion is accurate in all respects. However, 
this is one of the most difficuit tasks of the editor. If the inaccuracy is 
due to arithmetical errors, it can be easily detected and corrected. But if 
the cause of inaccuracy is faulty information supplied, it may be difficult 
to verify it, e.g.. information relating to income, age, etc. 

4. Editing for homogeneity. By homogeneity we mean the condition 
in which all the questions have been understood in the same sense. The 
editor must check all the questions for uniform interpretation. For example, 
as to the question of income, if some informants have given monthly 
income, others annual income and still others weekly income or even 
daily income, no comparison can be made. Similarly, if some persons 
have given the basic income whereas others the total income, no compa- 
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Tison is possible, The editor should Check up that the information 
Supplied by the various people is homogeneous and uniform. 


dary data. In this context, Prof. Bowley rightly points out that "secondary 
data should not be accepted at their face value," The reason is that such 
data may be erroneous in many respects due to bias, inadequate size of 
the sample, Substitution, errors of definition, arithmetica] errors, etc, Even 
if there is no error, such data may not be suitable and adequate for the 
Purpose ot the enquiry, Hence, before using such data, the investigator 
should consider the following aspects : 

І. Whether the data are suitable for the Purpose of investigation in 
view. Before usin i 
data are suitable for the purpose of the enquiry. The Suitability of data 
can be judged in the light of the nature and Scope of investigation. For 


allowances of workers and the data relate to basic wages alone, such data 
would not be suitable for the immediate Purpose. It may be difficult to 


light of the time period for which the data are available. For example, 
for studying trend of Prices we may use data for the last 8-10 years but 
from the source known to us data may be ayailable for the last 2-3 years 


- Whether the data are reliable. It is Very difficult to find out 
whether the Secondary data are reliable or not. The following tests if 
applied may be helpful to determine how far the given data are reliable : 

m as the collecting agency unbiased or did it "have an axe to 
grin 


(ü) If the enumeration was based on a sample, was the sample 
representative ? 

(iii) Were the enumerators capable and Properly trained ? Incom- 
petent or poorly trained enumerators cannot be depended upon to 
produce useful result. 

(i) Was there a proper check on the accuracy of field work ? 

0) Was the editing, tabulating and analysis carefully and con- 


SUGGESTED READINGS 
Boyd and Westfall t Marketing Reasearch, Text and Cases, Ch. 8. 
Croxton and Cowden : Applied General Statistics, Ch. 2. 
Karmel : Applied Statistics for Economists, Ch. 3. 
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Sampling and Sample Designs 


' The need for adequate and reliable data is ever increasing for taking 
licy decisions in different fields of human activity, There are two ways 
in which the required information may be obtained : 

1. Complete enumeration survey; and 

2. Sampling technique. 

Under complete enumeration survey method,* data are collected for 
each and every unit (person, household, field, shop, factory, etc., as the 
case may be, of the population or universe* which is the complete set of 
items which are of interest in any particular situtation. For example, if 
the average weight of the students of Delhi University is to be obtained 
than all the students in different colleges will be weighed and the average 
weight will be obtained by dividing the total weight of all the students by 
the number of students. 


The advantage of this type of survey will be that no unit is left out 
and hence greater accuracy may be ensured. However, the effort, money 
and time required for carrying out complete enumeration will generally be 
extremely large and in many cases cost may be so prohibitive that the very 
idea of collecting information may be dropped. Hence, in modern times 
very little use is made of complete enumeration survey. How to collect 
the data then ? It is through the adoption of sampling technique that a 
large mass of data pertaining to different aspects of human activity is 
collected these days. 

Sampling Procedure 

расову is simply the process of learning about the population 
on the basis of a sample drawn from it. Thus, in the sampling tec 
instead of every unit of the universe only à part of the universe is 


* This method is also known as the census method, 

** The word ‘universe’ as used in Statistics denotes the aggregate from which 
the sample is to be taken. For example, if there are 1,00,000 students in Delhi 
University and a sample of 5,000 st is taken to study their attitudes towards 
semester syster, then 1,00,000 constitute the universe and 5,000 the sample size, 
It should Гер, noted that Po mo ad may not ае а) Np eus It 

consist of any object or example, if one terested in 

the bomber of cars and buses in Delhi then his ‘universe’ will comprise mel 
number of cars and buses. 

iverse be cither finite or infinite. A finite universe is one in which 
the Nc peka tiom ble, such as, the number of un students in 
Delhi or in India. An infinite universe is that in which the number of items cannot 
be determined, such as the number of stars in the sky, In some cases, the universe 
is so large that for all practical purposes it is regarded as infinite, such as, the num- 
ber of leaves on a tree. ` 
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and the conclusions are drawn on that basis for the entire universe. The 
process of sampling involves three elements : 


(a) Selecting the sample, 
(b) col'ecting the information, and 
(c) making an inference about the population. 


The three elements cannot generally be considered in isolation from 
опе another. Sample selection, data collection, and estimation are all 
interwoven and each has an impact on the others. Sampling is not hapha- 
zard selection ; it embodies definite rules for selecting the sample. But 
having followed a set of rules for sample selection, we cannot cortsider 
the estimation process independent of it; estimation is guided by the 
manner in which the sample has been selected. 


Although much of the development in the theory of sampling has 
taken place only in recent years, the idea of sampling is pretty old. Since 
times immemorial people have examined a handful of grains to ascertain 
the quality of the entire lot. A housewife examines only two or three 
grains of boiling rice to know whether the pot of rice is ready or not. А 
doctor examines a few drops of blood and draws conclusion about the 
blood constitution of the whole body. A businessman places orders for 
materials by examining only a small sample of the same. А teacher may 
put questions to one or two students and find out whether the class as à 
whole is following the lesson. In fact there is hardly any field where the 
technique of sampling is not used either consciously or unconsciously. 


Purpose of Sampling. A sample is not studied for its own sake. The 
basic objective of its study is to draw inference about the population. In 
other words, sampling is only a tool which helps to know the characteris- 
tics of the universe or population by examining only a small part of it. 
The values obtained from the study of a sample, such as the average and 
dispersion, are known as 's/atistics'. On the other hand, such values for the 
population are called ‘parameters’. 

THEORETICAL BASIS OF SAMPLING - 

On the basis of sample study we can predict and generalize the 
behaviour of mass phenomena. This is possible because there is no 
statistical population whose elements would vary from each other without 
limit. For example, wheat varies to a limited extent in colour, protien 

. content, length, weight etc., but it can always be identified as wheat. 
Similarly apples of the same tree may vary in size, colour, taste, weight, 
etc. but they can always be identified as apples. Thus we find that al- 
though diversity is a universal quality of mass data, every population has 
characteristic properties with limited variation. This makes possible to 
select a relatively small unbiased random sample that can portray fairly 
well the trail of the population, 
ve ye: are two important principles on which the theory of sampling 
is based : 

l. Principle of ‘Statistical Regularity’, and 

2. Principle of 'Inertia of Large Numbers'. 

These principles are sometimes referred to as the laws of sampling. 
However, they are not laws in the strict sense of the term; they are, 
rather, tendencies which operate universally. 
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Principle of Statistical Regularity 


The principle is derived from the mathematical theory of probability. 
In the words of King, “The law of statistical regularity lays down that a 
moderately large number of items chosen at random from a large group are 
almost sure on the average to possess the characteristics of the large group.” 
In other words, this principle points out that if a sample is taken at ran- 
dom from a population, it is likely to possess almost the same characteris- 
tics as that of the population. This principle directs our attention to one 
уер} important point, that is, the desirability of choosing the sample at. 
random. 


By random selection we mean a selection where each and every item 
of the population has an equal chance of being selected in the sample. In 
other words, the selection must not be made by deliberate exercise of one’s 
discretion. A sample selected in this manner would be representative of 
the population. If this condition is satisfied, it is possible for one to depict 
fairly accurately the characteristics of the population by studying only а 
part of it, Hence, this principle is of great practical significance because 
it makes possible a considerable reduction of the work necessary before 
any conclusion is drawn regarding a large universe. For example, if one 
intends to make a study of the average height of the students of Delhi 
University it is not necessary to measure the height of each and every 
student. A few students may be selected at random from every college, 
their heights measured and the average height of university students in 
general may be inferred. 


It should be noted that the results derived from sample data may be 
different from that of the population. This is for the simple reason that 
the sample is only a part of the whole universe. For example, the average 
height of the students of Delhi University may come out to be 160 cm. by 
census method whereas it may be 159 cm. or 161 cm. for the sample taken. 
It would be just a coincidence if the height comes out to be exactly 160 
cm. under both the methods. However, there would not be much diffe- 
rence in the results derived if the sample is selected at random. 


This principle is a corollary of the principle of statistical regularity. 
It is of great significance in the theory of sampling. It states that, other 
things being equal, larger the size of the sample, more accurate the results 
are likely to be. This is because large numbers are more stable as com- 
pared to small ones. The difference in the aggregate result is likely to be 
insignificant, when the number in the sample is Jarge, because when large 
numbers are considered the variations in the component parts tend to 
balance each other and, therefore, the variation in the aggregate is insigni- 
ficant. For example, if a coin is tossed 10 times we should expect an 
equal number of heads and tails, i.e. Seach. But since the experiment is 
tried a small number of times it is likely that we may not get exactly 5 
heads and 5 tails. The result may be a combination of 9 heads and 1 tail, 
or 8 heads and 2 tails, or 7 heads and 3 tails. Ifthe same experiment is 
carried out 1,000 times, the chance of getting 500 heads and 500 tails would 
be very high, i.e., the result would be very near to 50% heads and 5096 
tails. The basic reason for such likelihood is that the experiment has been 
carried out a sufficiently large number of times and possibility of variations 
P4 


E44 SAMPLING AND SAMPLE DESIGNS 


1n one direction compensating for others in a different direction is greater. 
If at one time we get continuously 5 heads, it is likely that at other times 
we may get continuously 5 tails, and so оп, апа for the experiment as а 
Whole, the number of heads and tails may be more or less equal. Simi- 
larly, if it is intended to study the variation in the production of rice over 
а number of years and data are collected from one or two States only, the 
result would reflect large variations in production due to the favourable 
or unfavourable factors in operation. If, on the other hand, figures of 
production are collected for all the States in India, it is quite likely that 
we find little variation in the aggregate. This does not mean that the pro- 
duction would remain constant for all the years. Itonly implies that the 
changes in the production of the individual States will be counter-baanced 
50 as to reflect smaller variations in production for the country as a whole. 
METHODS OF SAMPLING 

The various methods of sampling or different sampling designs can 
be grouped under two broad heads—random sampling and non-random 
sampling. Randam sampling is also referred to as probability sampling 
Since if the sampling process is random the laws of probability can be 
applied. It may be noted that the term random sample is not used to 
describe the data in the sample but the process employed to select the 
sample. Randomness is thus a property of the sampling procedure instead 
of an individual sample. As such randomness can enter a processed sampling 
in a number of ways and hence random samples may be of many kinds. 
Advantages of Probability Sampling 

The following are the basic advantages of probability sampling : 

(1) Probability sampling does not depend upon the existence of 
detailed information about the universe for its effectiveness. 

(2) Probability sampling provides estimates which are the essentially 
'unbiased and have measurable precision. 

(3) It is possible to evaluate the relative efficiency of various sample 
designs only when probability sampling is used. 

Limitations of Probability Sampling 

Despite the great advantages of probability sampling techniques 
mentioned above it has certain limitations because of which non-probability 
sampling is quite often used in practice. These limitations are. 

(1) Probability sampling requires a very high level of skill and 
experience for its use. 

(2) Itrequiresa lot of time to plan and execute a probability 
sample. 

(3) The costs involved in probability sampling are larger as 
compared to non-probability sampling. 

Non-random sampling is a process of sample selection without th? 
use of randomization. In other words, a 'non-ra ndom sample is 
selected on a basis other than the probability considerations such as 
convenience, expert judgment, etc. 

The most important difference between random and non-random 
sampling is that whereas the pattern of sampling variability can be ascer- 
tained in case of random sampling, in non-random sampling, there is no 
way of knowing the pattern of variability in the process. 
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In the following few pages some of the important sampling methods 
that are popularly used in practice are given below : 


A. Random Sampling Metheds : 
(a) Simple or unrestricted random sampling ; and 
(b) Restricted random sampling : 


(i) Stratified sampling, 
(ii) Systematic sampling, 
(iii) Cluster sampling. 
B. Non-Random Sampling Methods : 
(i) Judgment sampling ; 
(ii) Convenience Sampling ; and 
(iii) Quota sampling 
A brief description of these methods is given below. 


A. RANDOM SAMPLING METHODS 
(a) Simple or Unrestricted Random Sampling.* 


Simple random sampling refers to the sampling technique in which 
each and every item ofthe population has an equal and independent 
chance of being included inthe sample. The selection is thus free from 
personal bias because the investigator does not exercise his discretion or 
preference in the choice of items. Some people believe that randomness 
of selection can be achieved by unsystematic and haphazard procedures. 
But this is quite wrong. However, the point to be emphasized is that 
unless precaution is taken, to avoid bias and a conscious effort is made to 
ensure the operation of chance factors, the resulting sample shall not to be 
a random sample. 

Theoretically in the selection ofa simple random sample every 
individual item drawn must be recorded, measured and returned to the 
population before another selection is made. This can lead to selection of 
the same item more than once in the same sample. However, in practice, this 
procedure is rarely observed. This is because of two reasons: First, very 
often we are not interested that the same item Бе selected more than once, 
and secondly, if the population is large compared with the sample, the error 
made by not returning the individual item is of no practical significance. 
It may be noted that when an item drawn is replaced, the probability of 
each element being drawn is 1/N whereas without replacement the proba- 
bility of selecting the subsequent items increases as N decreases by the 
number of items previously drawn. Hence if the universe is very small, 
itis highly desirable that every selected item is returned to the population 
before another selection is made. 

To ensure randomness of selection one may adopt either the Lottery 
method or consult table of random numbers. 

Lottery Method. This is very popular method of taking a random 
sample. Under this method, all items of the universe are numbered or 
named on separate slips of paper of identical size and shape. These 
slips are then folded and mixed up in a container or drum. A blindfold 


*Simple random samples are characterised by the way in which they are 
selected, ‘Random’ is not used here in the sanse of ‘haphazard’ or *hit-or-miss'. 
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selection is then made ofthe number of slips required to constitute the 
desired size of sample. The selection of items thus depends entirely on 
Chance. The method would be quite clear with the help of an example. 
If we want to take а sample of 10 persons out of a population of 100, the 
procedure is to write the names of all the 100 persons оп separate slips of 
paper, fold these slips, mix them thoroughly and then make a blindfold 
Selection of 10 slips. 

The above method is very popular in lottery draws where a decision 
about prizes is to be made. However, while adopting lottery method it is 
absolutely essential to see that the slips are of identical size, shape and 
colour otherwise, there isa lot of possibility of personal prejudice and 
bias affecting the results. 

Table of Random Numbers, The lottery method discussed above 
becomes quite cumbersome to use as the size of population increases. An 
alternative method of random selection is that of using the table of random 
numbers. 

„Тһе random numbers are generally obtained by some mechanism 
which, when repeated a large number of times, ensures approximately equal 
frequencies for the numbers from 0 to 9 and also proper frequencies for 
various combinations of numbers (such as 00, 01,......99 ; 000, 001....999 ; 
etc.) that could be expected in a random sequence of the digits 0 to 9. 

‚ Several standard tables of random numbers аге available, among 
which the following may be specially mentioned, as they have been tested 
extensively for randomness. 

(1) _Tippett’s (1927) random number tables consisting of 41,600 
random digits grouped into 10,400 sets of four-digited random numbers ; 

(2) Fisher and Yates (1938) table of random numbers with 15,000 
random digits arranged into 1,500 sets often digited random numbers. 

(3) Kenda!i and В.В. Smith (1939) table of random numbers leaving 
Dn random digits grouped into 25,000 sets of four-digited random 
num i 

(4) Rand corporation (1955) table of random numbers consisting of 
100000 random digits grouped into 2,00,000 sets of five-digited random 
numbers ; and 

(5) C.R. Rao, Mitra and Matthai (1966) table of random numbers 
with 20,000 random digits grouped into 5,000 sets of four-digited random 
numbers. 

Tippett's table of random numbers is most popularly used in practice. 
We give below the first forty sets from Tippett’s table as an illustration of 
the general appearance of random numbers : 


2952 6641 3992 9792 7969 5911 3170 5624 
4167 i 1545 1396 7203 5356 1300 2693 


Osco 5246 1112 6107 6008 8125 4233 8776 
2754 9143 1405 9025 7002 6111 8816 6446 
It is important that the starting point in the table of random numbers 
b. selected in some random fashion so that every unit has an equal chance 
of being selected. 
Опе may question, and quite rightly, as to how it is ensured that 
these digits are random. It may be pointed out that the digits in the table 
were chosen haphazardly but the real guarantee of their randomness lies in 
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practical tests. Tippett’s numbers have been subjected to numerous tests- 
and used in many investigations and their randomness has been well esta- 
blished for all practical purposes. An example to illustrate how Tippett’s 
table of random number may be used is given below. Suppose we have 
to select 20 items out of 6,000. The procedure is to number all the 6,000 
items from 1 to 6,000. A page from Tippett's table may then be consulted 
and the first twenty numbers up to 6,000 noted down. Items bearing 
those numbers will be included in the sample. Making use of the portion 
of the table, given on the previous page, the required numbers ate : 


2952 3992 5911 3170 5624 4167 
1545 1396 5356 1300 2693 2370 
3408 2762 3563 1089 0560 5246 
1112 4533 


The items which bear the above numbers constitute the sample. 

Universe size less than 1,000. 1f the size of universe is less than 1,000, 
the procedure will be different, as Tippett’s numbers are available only in 
four figures. Thus, for example, if it is desired to take а sample of 10 
items out of 400 all items from 1 to 400 should be numbered as 0001 to 
0400. We may now select 10 numbers from the table which are up 
to 0400. 

Universe size less than 100. If the size of universe is less than 100, 
the tabie is used as follows : Suppose ten numbers from out of 0 to 80 are 
required. We start anywhere in the table and write down the number in 
pairs. The table can be read hoizontally, vertically, diagonally, or in any 
methodical way. Starting with the first and reading horizontally (please see 
the part of the table given on page E 4'6, we obtain 29, 52, 66, 41, 39, 
92, 97, 92, 79, 69, 59, 11, 31, 70, 56, 24, 41, 67, and so on. Ignoring the 
numbers greater than 80, we obtain for our purpose ten random numbers, 
namely, 29, 52, 66, 41, 39, 79, 69, 59, 11 and 31. 

Fisher and Yate's tables consist of 15,000 numbers. These have been 
arranged in two digits in 300 blocks, each block consisting of 5 rows and 
5 columns. Kendall and Smith also constructed random numbers (10,000 
in al) by using а  randomising machine.. However, this method of 
random selection cannot be followed in case of articles like ghee, oil, 
petrol, wheat, etc. 

Merits. 1. Since the selection of items in the sample depends 
entirely on chance there is no possibility of persona! bias affecting the 
results. 

2. As compared to judgment sampling, а random sample represents 
the universe in a better way. As the size of the sample increases, it becomes 
increasingly representative of the population. 

3. The analyst can easily assess the accuracy of his estimate 
because sampling errors follow the principle of chance. The theory of 
random sampling is further developed than that of any other type of 
sampling which enables the surveyor to provide the most reliable informa- 
tion at the least cost. 2 

Limitations. 1. The useof simple random sampling necessitates 
a completely catalogued universe from which to draw the sample. But it 
is often difficult for the investigator to have up-to-date lists ofallthe items 
ofthe population to be sampled. This restricts the use of this method 
in economic and business data where very often we have to employ 


restricted random sampling designs. 
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2. The size of the sample required to ensure statistical reliability is 
usually larger under random sampling than in stratified sampling. 

3. From the Point of a view of field survey it has been claimed 
that cases selected by radom sampling tend to be too widely dispersed 
foogrüphically and that the time and cost of collecting data become too 
arge. 

4. Random sampling may produce the most non-random-looking 
results. For example, thirteen cards from a well-shuflled pack of playing 
cards may consist of one suit, But the probability of this type of incid- 
ence is very, very small. 


(5) Restricted Random Sampling 
(i) Stratified Sampling 


When this method of Sampling is adopted, the population is divided 
into homogeneous groups or classes called stratas and a sample is drawn 
from each stratum at random. For example, if we are interested in 
Studying the consumption pattern of the People of Delhi, the city of Delhi 


A Stratified sample may be either proportional or disproportionate. 

In a proportional Stratified sampling plan, the number of items drawn 

from each strata is Proportional to the size of the strata. For example, 

if the population is divided into five groups, their respective sizes being 

10, 15, 20, 30 and 25 Per cent of the population and a sample of 5,000 is 

drawn, the desired Proportio;al sample may be obtained in the following 
manner : 

From stratum. one 5,000 (010) = 500 

E m two 5,000 (015) = 750 

P » three 5,000 (0:20) = 1,000 

М 2 four 5,000 (0:30) — 1,500 

» » five 5,000 (0:25) = 1,250 


Size of the entire sample = 5,000 


Proportional stratification yields a sample that represents the universe 
with respect to the Proportion in each stratum in the population. This 
Procedure is satisfactory if there is no great difference in dispersion from 
Stratum to stratum. But it is certainly not the most efficient procedure, 
especially when there is considerable variation in different strata. This 
indicates that in order to obtain maxi mum efficiency in Stratification, we 
should assign greater Tepresentation to a Stratum with a larger dispersion 
and smaller representation to one with small variation. 

In disproportionate stratified sampling an equal number of cases 
is taken from each stratum regardless of how the stratum is represented 
in the universe. Thus, in the above example, an equal number of items 
(1,000) from each stratum may be drawn. 
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Merits. 1. More representative. Since the population is first 
divided into various strata and then a sample is drawn from each stratum 
there is little possibility of any essential group of the population being 
completely excluded. А тоге representative sample is thus secured. 
Stratified sampling is frequently regarded as the most efficient system of 
sampling. 

2. Greater accuracy. Stratified sampling ensures greater accuracy. 
The accuracy is maximum if each stratum is so formed that it consists of 
uniform or homogeneous items. 


3. Greater geographical concentration. Аз compared with random 
sample, stratified samples can be more concentrated geographically. 
Thus the time and expense of interviewing may be considerably reduced. ^ 

Limitations. 1. Utmost care must be exercised in dividing the 
population into various stratas. Each stratum must contain, as far as pos- 
sible, homogeneous items as otherwise the results may not be reliable. 
However, this is a very difficult task and may involve considerable time 
and expense. 

2. The items from each stratum should be selected at random. But 
this may be difficult to achieve in the absence of skilled sampling super- 
visors and a random selection within each stratum may not be ensured. 
(il) Systematic Sampling 

This method is popularly used in those cases where a complete list 
of the population from which sample is to be drawn is available. The 
method is to select every kth* item from the list where 'k' refers to the 
sampling interval. The first item between the first and the kth is selected 
at random. For example, if a complete list of 1,000 students ofa college 
is available and if we want to draw a sample of 200 this means we must 
take every fifth item (i.e, k—5). The first item between one and five 
shall be selected at random. Suppose it comes out to be three. Now we 
shall go on adding five and obtain numbers of the desired sample. Thus, 
the second item would be the 8th student, the third, 13th stucent; the 
fourth, 18th student ; and so on. 

Systematic sampling is relatively a simple technique and may be 
more efficient than simple random sampling provided the lists are arranged 
whoily at random. However, it is rarely that this requirement is fulfilled. 
The nearest approach to randomness is provided by alphabetical lists such 
as are found in telephone directory, although even these may have certain 
non-random characteristics. 

Merits. The systematic sampling design is simple and convenient to 
adopt. The time and work involved in sampling by this method are 
relatively smaller. The result obtained are also found to be generally 
satisfactory provided care is taken to see that there are no periodic fea- 
tures associated with the sampling interval. If populations are sufficiently 
large, systematic sampling can often be expected to yield results similar 
to those obtained by proportional stratified sampling. 


Limitations. The main limitation of the method is that it becomes 
less representative if we are dealing with populations having hidden 


Size br fhe sie 


* Sampling interval or Кете ofthe sample 
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periodicities. Also if the population is ordered in a systematic way with 
respect to the characteristics the investigator is interested in, then it is 
possible that only certain types of items will be included in the population, 
or at least more of certain types than others. For instance, in a study of 
workers’ wages the list may be such that every tenth worker on the list 
gets wages above Rs. 150 per month. 

(iii) Multi-stage Sampling 

Аз the name implies this method refers to a sampling procedure 
which is carried out in several stages. The material is regarded as made 
up of a number of first stage sampling units, each of which is made ofa 
number of second stage units, еіс. At first, the first stage units are samp- 
led by some suitable method, such as simple random sampling. Then, a 
sample of second stage units is selected from each of the selected first stage 
units, again by some suitable method which may be the same as or diffe- 
rent from the method employed for the first stage units. Further stages 
may be added as required. The procedure may be iilustrated as follows : 

Let us suppose, it is decided to їаке а sample of 5,000 households 
from the State of U.P. At the first stage, the State may be divided into a 
number of districts and a few distrlcts selected at random. At the second 
stage, each district may be sub-divided into а number of villages and a 
sample of villages may be taken at random. At the third stage, a number 
of households may be selected from each of the villages selected at the 
second stage. In this way, at each stage the sample size becomes smaller 
and smaller. 

Merits. Multi-stage sampling introduces flexibility in the sampling 
method which is lacking in the other methods. It enables existing divisions 
and sub-divisions of the population to be used as units at various stages, 
and permits the field work to be concentrated and yet large area to be 
covered. Another advantage of the method is that sub-division into 
second stage unit (i.e., the construction of the second stage frame) need 
be carried out for only those first stage units which are included in the 
sample. It is, therefore, particularly valuable in surveys of underdeveloped 
areas where no frame is generally sufficiently detailed and accurate for 
sub-division of the materlal into reasonably small sampling units. 

Limitations. However, a multi-stage sample is in general less 
accurate than a sample containing the same number of final stage units 
which have been selected by some suitable single stage process. 


B. NON-RANDOM SAMPLING METHODS 


(i) Judgment Sampling 
In judgment sampling the choice of sample items depends exclusively 
on the discretion of the investigator. In other words, the investigator 
= exercises his judgment in the choice and includes those items in the sample 
which he thinks are most typical of the universe with regard to the charac- 
teristics under investigation. For example, if a sample of ten students is 
to be selected from a class of sixty for analysing the spending habits of 
students, the investigator would select 10 students who, in his opinion, are 
representative of the class. 


Merits. Though the principles of sampling theory are not applicable 
to judgment sampling, the method is often used in solving many types of 


—— 
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economic and business problems, The use of judgment sampling is justified 
under a variety of circumstance : 

(i) When only a small number of sampling units is in the universe, 
simple random selection may miss the more important elements, whereas 
judgment selection would certainly include them in.the sampie. 

(ii) When we want to study some unknown traits of a population, 
some of whose characteristics are known, we may then stratify the popu- 
lation according to these known properties and select sampling units from 
each stratum on the basis of judgment. This method is used to obtain a 
more representative sample. 

(iit) In solving everyday business problems and making public policy 
decisions, executives and public officials are often pressed for time and 
cannot wait for probability sample designs. Judgment sampling is then 
the only practical method to arrive at solutions to their urgent problems. 

Limitations. This method, though simple, is not scientific because 
there is a big possibility of the results being affected by the personal pre- 
judice or bias of the investigator. Thus, judgment sampling involves the 
risk that the investigator may establish foregone conclusions by including 
those items in the sample which conform to his preconceived notions. For 
example, if an investigator holds the view that the wages of workers ina 
certain establishment are very low, and if he adopts the judgment sampling 
method, he may include onle those workers in the sample whose wages are 
low and thereby establish his point of view which may be far from the 
truth. Since an element of subjectiveness is possible, this method cannot 
be recommended for general use. 

(ii) Convenience Sampling 


The method of convenience sampling is also called the chunk. A 
chunk refers to that fraction of the population being investigated which 
is selected neither by probability nor by judgment but by convenience, 
A sample obtained from readily available lists such as automobile regis- 
trations, telephone directories, etc. is a conveniehce sample and not a 
random sample even if the sample is drawn at random from the lists, 
If a person is to submit a project report on labour-management relations 
in textile industry and he takes a textile mill close to his office and 
interview some people over there, he is following convenience sampling 
method. 

The results obtained by following convenience sampling method 
can hardly be representative of the population—they are generally biased 
and unsatisfactory. However, convenience sampling is often used for 
making pilot studies. Questions may be tested and preliminary information 
may be obtained by the chunk before the final sampling design is decided 
upon. 

(iii) Quota Sampling 

Quota sampling is a type of judgment sampling. In a quota sample, 
quotas are set up according to some specified characteristics such as so 
many in each of several income groups, so many in each age, so many 
with certain political or religious affiliations, and so оп. Each interviewer 


* Judgment sampling isa type of non-random sampling and is also called 
purposive sampling or deliberate sampling. 
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is than told to interview a certain number of persons which constitutes his 
quota. Within the quotas, the selection of sample items depends on per- 
sonal judgment. For example, in a radio listening survey, the inter- 
views may be told to intervlew 500 people living in a certain area and that 
out of every 100 persons. interviewed 60 are to be housewives, 25 farmers 
and 15 children under the age of 15. Within these quotas the interviewer 
is free to select the people interviewed. The cost per person interviewed 
may be relatively small for a quota sample but there are numerous oppor- 
tunities for biases which may invalidate the results. For example, inter- 
viewers may miss farmers working in the fields or talk with those. house- 
wives who are at home. If a person refuses to respond, the interviewer 
simply selects someone else. Because of the risk of personal prejudice and 
bias entering the process of selection, the quota sampling is not widely 
used in practical work. 


Quota sampling is often used in public opinion studies. It occasionally 
rovides satisfactory results if the interviewers are carefully trained and 
if they follow their instructions closely. 


Selection of Appropriate Method of Sampling 

Having discussed the various methods of sampling, the question now 
arises as to which method to adopt ina particular situation. It should 
be noted that one method can be regarded as best under all circumstances— 
each method has its own speciality. А number of factors such as nature 
of the problem, size of universe, size of the sample, availability of finance, 
time, etc. would influence the selection of a particular method of sampling. 


Size of Sample 

An important decision that has to be taken while adopting a sampl- 
ing technique is about the size ofthe sample. Different opinions have 
been expressed by experts on this point. For example, some have suggested 
thatthe samplesize should be 5% of the size of population while 
. Others are of the opinion that sample size should beat least 10%. How- 
ever, these views are of little use, as in practice appropriate sample size 
depdtids on various factors relating to the subject under investigation like 
the time aspect, the cost aspect, the degree of accuracy desired, etc. Sampl- 
ing theory is of little help in arriving ata good estimate of the sample 
size in any particular situation. However, the following two considerations 
may be kept in mind in determining the appropriate time size of the 
sample. 
l. The size ofthe sample should increase as the variation in the 
individual items increases. 

2. The greater the degree of accuracy desired, the large should be 
the sample size. 


Merits of Sampling 

The sampling technique has the follewing merits over the complete 
enumeration survey : 

l. Lesstime. Since the sample is a study of a part of the popula- 
tion considerable time and labour are saved when a sample survey is 
carried out, Time saved not only in col'ecting data but also in processing 
it. For these reasons a sample provides more timely data in practice that 


a census. 
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2. Less cost. Although the amount of effort and expense involved 
in collecting information is always greater per unit of the sample than a 
complete census, the total financial burden of a sample survey is generally 
less than that of a complete census. This is because of the fact that in 
sampling, we study only a part of population and the total expense of 
collecting data is less than that required when the census method is adopt- 
ed. This is a great advantage particularly in an underdeveloped economy 
where much of the information would be difficult to collect by the census 
method for lack of adequate resources. 

3. More reliable reults. Although the sampling technique involves 
certain inaccuracies owing to sampling errors, the result obtained is 
generally more reliable than that obtained from а complete count. There 
are several reasons for it. First,it is always possible to determine the 
extent of sampling errors. Secondly, other types of errors to which a 
survey is subject, such as inaccuracy of information, incompleteness of 
returns, etc., are likely to be more serious in a complete census than in 

This is because more effective precautions can be taken 


degree of confidence because of our knowledge ofthe probable size of 
error. Thirdly, it is possible to avail of the services of experts and to 
impart thorough training to the investigators in a sample survey which 
further reduces the possibility of errors. Follow-up work can also be 


undertaken much 
a complete census can only be tested for accuracy by some type of sampl- 


ing check. 


(а) We may collect the necessary data from each one of the 1,000 
people through a questionnaire containing, say, 10 questions (census 


(b) We may take a sample of 100 persons (i.e. 1094 of population 
and prepare à questionnaire containing as many as 100 questions. The 
expenses involved in the latter case would almost be the same as in the 
former but it will enable nine times more information to be obtained. 

5. Sampling method is te only method that can be used in certain 

There are some cases in which the census method is inapplicable 
ticable means is provided by the sample method. For 

interested in testing the breaking strength of chalks 
manufactured in а factory under the census method all the chalks would 
be brcken in the process of testing. Hence, census method is impracti- 
cable and resort must be had to the sample method. Similarly if the 
producer wants to find out whether the tensile strength ofa lot of steel 
wires meets the specified standard, he must resort to sample metnod 
because census would mean complete destruction of al) the wires. 

6. The sample method is often used to judge the accuracy of the 
information obtained on a census basis. For example, in the population 


cases. 
and tbe only prac 
example, if one 1s 
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census which is conducted very often 10 yearsin our country the field 
officers employ the sample method to determine the accuracy of informa- 
tion obtained by the enumerators on the census basis. 
Limitations of Sampling 

Despite the various advantages of sampling, it is not altogether free from 
limitations. Some of the difficulties involved in sampling are stated below : 

1. A sample survey must be carefully planned and executed other- 
wise the results obtained may be inaccurate and misleading. Of course, 
even for a complete count care must be taken but serious errors may arise 
in sampling, if the sampling procedure is not perfect. 

2. Sampling generally requires the services of experts, If only 
for consultation purposes. Inthe absence of qualified and experienced 
Persons, the information obtained from sample surveys cannot be relied 
upon. In India, shortage of experts in the sampling field is a serious 
hurdle in the way of reliable statistics. 

3. At times thesampling plan may beso complicated that it 
requires more time, labour and money than a complete count. This is so 
if the size of the sample is a large proportion of the total population and if 
‘complicated weighed procedures are used. With each additional complica- 
tion in the survey, the chances of errors multiply and greater care has to 
be taken which, in turn, means more time and labour. 

4, Ifthe information is required foreach and every unit in the 
domain of study, a complete enumeration survey is necessary. 

SAMPLING AND NON-SAMPLING ERRORS 

To appreciate the need for sample surveys, it is necessay to under- 
Stand clearly the role of sampling and non-sampling errors in complete 
enumeration and sample surveys. The error arising due to drawing infer- 
ences about the population on the basis of few observations (sampling) 
is termed sampling error. Clearly the sampling error in this sense is non- 
existent in a complete enumeration survey, since the whole population is 
surveyed. However, the error mainly arising at the stages of ascertainmen t 
and processing of data, which are termed non-sampling errors, are common 
both in complete enumeration and sample surveys. 

І. Sampliag Errors 

Even if utmost care has been taken in selecting a sample. The results 
derived from a sample study may not be exactly equal to the true value 
in the population. The reason is that estimate is based on a part and not. 
onthe whole and samples are seldom, if ever, perfect miniature of the 
population. Hence sampling gives rise to certain errors known as sampl- 
ing errors (or sampling fluctuations). These errors would not be present 
in a complete enumeration survey. However, these errors can be controlled. 
The modern sampling theory helps in designing the survey in such a 
a manner that the sampling errors can be made small. 

Sampling errors are of two types-—biased and unbiased. 

(1) Biased errors. These errors arise from any bias in selection, 
estimation, etc. For example, if in place of simple random sampling, 
deliberate sampling has been used ina particular case some bias is intro- 
duced in the result and hence such errors are called biased sampling errors. 

(2) Unbiased errors. These errors-arise due to chance differences, between 
the mombers of populatign included in the sample and those not included. 


| 
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Thus the total sampling error is made up of error due to bias, if any, 
and the random sampling error. The essence of bias is that it forms а cans- 
tant component of error that does not decrease in a large population as the 
number in the sample increases. Such error is, therefore, also known as 
cumulative or non-cumpensating error. The random sampling error, on the 
other hand, decreases on an average as thejsize of the sample increases. Such 
error is, therefore, also known as non-cumulative or compensating error. 
Causes of Bias 

Bias may arise due to : 

(i) faulty process of selection ; 
(if) faulty work during the collection of information ; and 
(ii) faulty methods of analysis. 

(i) Faulty Selection. Faulty selection of the sample may give rise to 
bias in a number of ways, such as : 

(a) Deliberate selection of a ‘representative’ sample. 

(b) Conscious or unconscious bias in the selection of a ‘random’ 
sample. The randomness of selection may not really exist, even though 
the investigator claims that he has a random sample if he allows his desire 
to obtain a certain result to influence his selection. 

(c) Substitution, Substitution ofan item in place of one chosen in a 
random sample sometimes leads to bias. Thus ifit was decided to inter- 
view every 50th householder in the street, it would be inappropriate to 
interview the 51st or any other number in his place as the. characteristics 
possessed by them may differ from those who were originally to Бе includ- 
ed in the sample. 

(d) Non-response. If all the items to be included in the sample are 
not covered, there will be bias even though no substitution has been 
attempted. This fault particularly occurs in mailed questionnaires, which 
are incompletely returned. Moreover, the information supplied by the 
informants may also be biased. 

(e) An appeal to the vanity of the person questioned may give rise 
to yet another kind of bias. For example, the question ‘Are you a good 
student ?’ is such that most of the students would succumb tc vanity and 
answer ‘Yes.’ 

(ii) Bias due to Faulty Collection of Data. Any consistent error in 
measurement will give rise to bias whether the measurements are carried 
out оп a sample or on all the units of the population. The danger of 
error is, however, likely to be greaterin sampling work, since the units 
measured are often smaller. Bias may arise due to improper formula- 
tion of the decision, problem wrongly defining the population, specifying 
the wrong decision, securing an inadequate frame, and so on. Biased 
observations may result from a poorly designed questionnaire, an ill-train-. 
ed interviewer, failure of a respondent's memory, etc. Bias in the flow of 
data may be due to unorganised collection procedure, faulty editing or 
coding of responses. 

(iii) Bias in Analysis. In addition to bias which arises from faulty 

process of selection and faulty collection of information, faulty methods 
of analysis may also introduce bias. Such bias can be avoided by adopt- 
ing the proper methods of analysis. 
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Avoidance of Bias 


If possibilities of bias exist, fully objective conclusions cannot be 
drawn. The first essential of any sarapling or census procedure must, 
therefore, be the elimination of all sources of bias. The simplest and the 
only certain way of avoiding bias in the selection process is for the sample 
to be drawn either entirely at random or at random, subject to restrictions 
which, while improving the accuracy, are of such a nature that they do 
not introduce bias in the results. In certain cases, systematic selection 
may also be permissible. 


Method of Reducing Sampling Errors 


Once the absence of bias has been ensured, attention should be given 
tothe random sampling errors. Such errors must be reduced to the 
minimum so as to attain the desired accuracy. 


Apart from reducing errors of bias, the „simplest way of increasing 
the accuracy ofa sample is to increase its size. The Sampling error 
usually decreases with increase in simple size (number units Selected in 
the samples) and in fact in many situations the decrease is inversely 
Proportional to the square-root of the sample size as can be seen from the 
diagram below. 


From this diagram it is clear that though the reduction in sampling 
error is substantial for initial increases in sample size, it becomes margi- 
rtain stage. 

ae Ot = " In other words, considera- 

bly greater effort is needed 

after a certain stage to 

decrease {һе sampling 

УЫ error than in the initial 

instances Hence after that 

Stage sizable reduction in 

Cost can be achieved by 

lowering even slightly the 

precision required. From 

this point of view, there is 

а strong case for resorting 

to a sample survey to pro- 

SAMPLE SIZE vide estimates within per- 

А missible margins of error 

instead of a complete enumeration surv ey, аз їп the latter the effort 

and the cost needed will be substantially higher due to the attempt to 
reduce the sampling error to zero. 


As regards non-sampling errors they are likely to Ье more in case 
of complete enumeration survey than in case of a sample survey, since it 
is possible to reduce the non-sampling errors to a greater extent by using 
better organization and suitably trained personnel at the field and tabu- 
lation stages. Тле behaviour of the non-sampling errors with increase in 
sample size is likely to be the opposite of that of sampling error, that is, 
the non-sampling error is likely to increase with mcrease in sample size. In 
many situations, it is quite possible that tke non-sampling esror in a 
complete 


enumeration survey is greater than both the sampling and non- 
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sampling errors taken together in a sample survey, and naturally in such 
situations the latter is to be preferred to the former. 
П. Non-Sampling Errors 

When a complete enumeration of units in the universe is made one 
would expert that it would give rise to data free from errors. However, 
in practice it is not so. For example, itis difficult to completely avoid 
errors of observation or ascertainment. So also in the processing of data 
tabulation errors may be committed affecting the final results. Errors 
arising in this manner are termed non-sampling errors, as they are due 
to factors other than the inductive process of inferring about the 
population from a sample. Thus, the data obtained in an investigation by ~ 
complete enumeration, although free from sampling error, would still be 
subject to non-sampling error, whereas the results ofa sample survey 
would be subject io sampling error as well as non-sampling error. 

Non-sampling errors can occur at every stage of planning and ехе+ 
cution of the census or survey. Such errors can arise due to a number of 
causes such as defective methods of data collection and tabulation, faulty 
difinition, incomplete coverage of tüe population or sample, etc. More 
specifically, non-sampling errors may arise from one or more of the 
following factors : 

l. Data specification being indequate and inconsistent with respect 
to the objectives of the census or survey. 

2. Inappropriate statistical unit. 

3. Inaccurate or inappropriate methods of interview, observation 
or measurement with inadequate or ambiguous schedules, definitions or 
instructions. 

4. Lack of trained and experienced investigators. 

5. Lack of adequate inspection and supervision of primary staff. 

6. Errors due to non-response, i.e., incomplete coverage in respect 
of units. 

7. Errors in data processing operations such as coding, punching, 
verification, tabulation, etc. 

i 8. Errors committed during presentation and printing of tabulated 
results, ) 

These sources are not exhaustive, but are given to indicate some of 
the possible sources oferror. Ina sample survey, non-sampling errors 
may also arise due to defective frame and faulty selection of sampling . 
units. 

Control of Non-sampling Errors. In some situations the non-sampl« 
ing errors may be large and deserve greater attention than the sampling 
errors. Малце іп general, sampling errors decreases with increase in 
sample size, non-sampling errors tend to increase with the sample size. In 
the case of compiete enumeration non-sampling error and in the case of 
sample surveys both sampling and non-sampiing errors require to be con- 
trolled and reduced to a level at which their presence does not vitiate the 
use of final results. 
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Classification and Tabulation 


5 of Data 


In the last few chapters the process of data collection was discussed, 
The collected data are usually contained in schedules or questionnaires. 
But that is not in ап easily assimilable form. The answers will require 
some analysis if their salient points are tobe brought out. Asa rule the 
first step in the analysis is to classify and tabulate the information collected, 
or, if published statistics have been employed, rearrange these into 
new groups and tabulate the new arrangement. In case of some investi- 
gations, the classification and tabulation may give such aclear picture 
of the significance of the material arranged that no further analysis is 
required. In other cases, these processes, though may materially assist 
the analysis, are not sufficient for a complete presentation of the facts. 
They are, however, very important whether they complete the analysis or 
form only part of it. The questionnaire may have been very carefully 
drawn up and the answers may be both complete and accurate, but until 
these answers are all brought together into the class to which they belong 
and the whole information displayed in a tabular form, no one will be a 
great deal wiser as to the contents of the replies. 


Although the phrase “classification and tabulation” has been used, 
classification is, in effect, only the first step in tabulation, for, in general, 
items having common characteristics must be brought together before the 


data can be displayed in tabular form. 


Meaning of Classification 


After collection 2nd editing of data the first step towards further 
processing the same is classification. Classification is the grouping of 
related facts into classes. Facts in one class differ from those of another 
class with respect to some characteristic called a basis of classification, 
Sorting facts on one basis of classification and then on another basis is 
called cross classification. This process can be repeated as many times as 
there are possible bases of classification. Classification of data is a function 
very similar to that of sorting letters in a post-office. It is well known that 
the letters collected in a post-office are sorted into different lots on a 
geographical basis, ї.е., in accordance with their destinations such as Bom- 
bay, Calcutta, Kanpur, Jaipur, etc. They are then put in separate bags, 
each containing letters with a common characteristic, viz., having the same 
destination. Classification of statistical data is comparable to the sorting 
operation. To take another example, when students seek admission in a 
college they submit applications to the office. The application forms con- 
tain particulars about their performance in the previous examinations, their 
date of birth, sex, nationality, etc. If one is interested in finding out how 
many first, second and third class students have joined the college, one 
may look into each and every form and note whether it relates to a first 
class student, second class student, etc. He may find that out of 1,000 


est Se 
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students who took admission 50 had first class, 800 second class and 150 
third class. The process with the help of which this information in a 
summary form is obtained is called the classification of data. 


Objects of Classifications 
The principal objectives of classifying data are : 
1. To condense the mass of data in such a manner that similarities. 


and dissimilarities can be readily apprehended. Millions of figures can thus. 
be arranged in a few classes having common features. 


2. Te facilitate comparison. 

3. To pinpoin: the most significant features of the data at a glance. 

4. To give prominence to the important information gathered while 
dropping out the unnecessary elements. 

5. To enable а statistical treatment of the material collected. 


Types of Classification 


Broadly, the data can be classified on the following four bases : 
1. Geographical, i.e., area-wise, e.g., cities, districts, etc. 

2. Chronological, i.e., on the basis of time. 

3. Qualitative, i.e., according to some attributes, 

4, Quantitative, i.e., in terms of magnitudes. 


1. Geographical Classification 
In this type of {classification data are classified on the basis of 
graphical or locational differences between the various items. For 
instance, the production of sugarcane in India may be presented State- 
wise in the following manner : 
PRODUCTION OF SUGARCANE FOR THE YEAR 1976 (Figures imaginary) 


Name of State Sugarcane Production 
(in million tonnes) 


Uttar Pradesh 48 
Bihar 18 
Tamil Nadu 8 
Maharashtra 4 
Other States 2 

Total 80 


Geographical classifications are usually listed in alphabetical order 
for easy reference. Items may also be listed by size to emphasize the 
important areas as in ranking the States by population. Normally in 
reference table the first approach is followed and in summary tables the 


second approach is followed. _ , 
2. Chronological Classification 

When data are observed over a period of time the type of classifica- 
tion is known as chronological classification. For, example, we may 


present the figures of population (or production, sales, etc.) as follows : 
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POPULATION OF INDIA FROM 1921 TO 1971 


Year Population of Indía Year Population of India 
(їп millions) (in millions) 

1921 , 248 1951 357 

1931 276 1961 438 

1941 313 1971 536 


$ Time series are usually listed in chronological order normally starte 
ing with the earliest period. When the major emphasis falls on the most 
recent events, a reverse time order may be used. 

3. Qualitative Classification 


,, In qualitative classification data are classified on the basis of some 
attribute ог quality such as sex, colour of hair, literacy, religion, etc, 
The point to note in this type of classification is that the attribute under 
study cannot be measured : one can only find out whether it їз present or 
absent in the units of the population under study. For example, if the 
attribute under study is blindness, one can only find out how many persons 
are blind in a given population. It is not possible to measure the degree 
of blindness in each case, Thus when only one attribute is studied 
two classes are formed, one possessing the attribute and the other not 
Possessing the attribute, This type of classification is known as sim; 

cation. For example, the population under study may be divided 
into two categories as follows : 


Population 
l 


f 


Blinds . ool: 

t In a similar manner, we may classify population on the basis of sex, 
s.e., into males and famales, or literacy, i.e, into literates and illiterates, 
and soon. The ope of classification where only two classes are formed 
is also called two-fold or dichotomous classification, If instead of forming 
only two classes we further divide the data on the basis of some attribute or 
attributes so as to form several classes, the classification is known as mani» 
fold classification. For example, we may first divide the population into 
males and females on the basis of the attribute ‘sex’ ; each of these classes 
may be further subdivided into ‘literate’ and ‘illiterate’ on the basis of the 
attribute ‘literacy’. Further classification can be made on the basis of some 
Other attribute, say, employment. An example of manifold classification 
is given below : 


Population 


угтана 


Males Females 


| 
АЛЕ. Mliterates  Literates Illiterates 


Emp.  Unemp. Emp. Unemp. LE T Emp. Unemp. i 
(Manifold Classification) 
Note. Emp. indicates Employed and Unemp. indicates Unemployed* 


Е ғ 
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`4. Quantitative Classification 


Quantitative classification refers to the classification of data according 
to some characteristic that can be measured, such as height, weight, etc, 
For example, the students of a college may be classified according to weight 
as follows : 


Weight (in 10.) No. of Students 
90—100 50 
100—110 200 
110—120 260 
120—130 360 
130—140 90 
140—150 40 

Total 100. 


In this type of classification, there are two elements, namely, (4) the 
variable, i.e., the weight in the above example, and (ii) the frequency, t.0., 
the number of students in each class. There were 50 students having weights 
ranging from 90 to 100 Ib., 200 students having weight ranging between 
100 to 110 lb., and so оп. Thus we can find out the ways in which the 
frequencies are distributed. 

Variable. A frequency distribution refers to data classified on che basis 
of some variable that can be measured such as prices, wages, age, number 
of units ques or consumed. The term ‘variable’ refers to the charac- 
Yeristic that varies in amount or magnitude ina frequency distribution, A 
variable may be either continuous or discrete, A continuous variable is 
capable of manifesting every conceivable fractional value within the range 
of po..ibilities, such as the height or weight of persons or the weight o! a 


.product. Thus, as a student grows, say, from 90 cm. to 150 cm. his height 


һ all values between these limits, On the other hand, a discrete 


ber of employees and number of machines in an establishment аге discrete 
variables. y speaking, continuous data are obtained through 


о. of Childern No, of Families Weight (b.) No. of Persons 
10 
80 


0 
1 
2 
3 
4 
5 
6 


(a) Discrete Frequency Distribution. (b) Continuous Frequency Distribution, 


Alth the theoretical distinction between continuous and discrete 
variation is and precise, in practical statistical work it is only an 
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approximation. The reason is that even the most precise instruments of 
measurement can be used only to a finite number of places. Thus every 
theoretically continuous series can never be expected to flow continuously 
with one measurement touching another without any break in actual 
observations. 

"Formation of a Discrete Frequency Distribution 


The process of preparing this type of distribution is very simple. We 
have just to count the number of times a particular value is repeated which 
is called the frequency of that class. In order to facilitate counting, prepare 
a column of ‘tally bars’. In another column, place all posible values of 
variables from the lowest to the highest. Then, put a bar (vertical line) 
-opposite the particular value to which it relates. To facilitate counting, 
blacks offive bars are prepared and some space is left in between each 
block. We finally count the number of bars corresponding to each value 
of the variable and place it in the column entitled ‘frequency’. The process 
shall be clear from the following example of the marks obtained by 25 
students in an examination : 


Marks 


10 20 20 30 40 25 25 30 40 20 25 3 50 
15 25 30 40 50 40 5 30 25 23 15 40 


The marks may be put in the form of a frequency distribution as 
follows : 


Marks Tally Bars Frequency 

10 ' 1 

15 " 2 

20 " 3 

25 mtn 7 

30 w 4 

40 at 5 

50 Ф 3 
Total 25 


This shows that there was l student who scored 10 marks, 2 
students who scored 15 marks, and so on. 

This method of classifying helps in condensing the data only where 
values are largely repeated, otherwise hardly any condensation will be 
done. In practice this method is rarely used. 


Classification according to Class-intervals 

This type of classification is most popular in practice. The following 
technical terms are important when data are classified according to 
class-intervals : 

(i) Class Limits. The class limits are the lowest and the highest 
values that can be included in the class. For example, take the class 
20—40. The lowest value of this class is 20 and the highest 40. The 
two boudaries of a class are known as the lower limit and the upper limit 
of the class. The lower limit of a class is the value below which there can 
be no item in the class, The upper limit of a class is the value above which 
no item can belong te that class. Ofthe class 70—89, 70 is the lower 
limit and 89 upper, limit, £e, in this class there can be no value 
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which із lessthan 70 ог more than 89. Similarly, if we take the class. 
90—109, there can be no value in that class which is less than 90 or more 
than 109. 

(ii) Class-intervals. The span ofa class, that is, the difference. 
between the upper limit and the lower limit, is known as class-interval. 
For example, in the class 20—40, the class interval is 20 (i.e., 40 minus. 
20). The size of the class-interval is determined by the number of classes 
and the total range in the data. Several systems have beer devised to 
determine the width of the class-interval. One method suggested by- 
Sturges* is as follows : 

НЕЙ Highest value— Lowest value 
afi 14-3:222 Log N 

"The specific figure of class-interval secured in а given instance may- 
come out to be a fractional value quite unsuited to actual use. For example, 
by applying the above formula, we may get i=10°126 in a particular case.. 
In such a situation ап approximate round number close to the theoretical 
value should be taken. "Thus instead of 10:126 we should take 10 as the. 
class-interval. 

(55) Class Frequency. The number of observations corresponding: 
to a particular class is known as the frequency of that class or the class. 
frequency. In the following illustration, the frequency of the class, 
100—200 is 50 which implies that there are 50 persons having income 
between Rs. 100—200. If we add together the frequencies of all individual. 
classes, we obtain the total frequency. Thus, in the same problem, the 
total frequency of the six classes is 550 which means that in all there are 

-550 persons whose income has been studied. 

Class Mid-point. Itis the value lying half-way between the lower 
and upper class limits of a class-interval. Mid-point of a class is ascertained: 
as follows : 

Mid-point___ Upper limit of the class-+ Lower limit of the class 

of a class 2 
For the purpose of further calculations in statistical work the mid-point of 
each class is taken to represent that class, En 

There are two methods of classifying the data according to class- 
"intervals, namely (i) ‘exclusive’ method, and (ii) ‘inclusive’ method. 

(i) ‘Exclusive’ Method. When the class intervals are so fixed 
that, the upper limit of one class is the lower limit of the next class it is 
known as the ‘exclusive’ method of classification. The following data are: 
classified on this basis : 


Income (Rs.) No. of Persons 

100—200 50 

200—300 100 

400 —500 150 

500—600 

600—700 10 
Total 550 

et T uos. De РЕР SS qan 
*H. А. Sturges : Journal of the American Statistical Association, March 1926, 
pp. 65-66. 
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It is clear that the ‘exclusive’ method ensures continuity of data inasmuch 
as the uj limit of one class is the lower limit of the next class. Thus, 
in the example, there are 50 persons whose income is between 
Rs. 100 and Rs. 199:99. A person whose income is Rs. 200 would be 
included in the class Rs. 200—Rs. 300. This method is widely followed 
in practice, However, it is confusing to a layman who has по knowledge 
-of Statistics. For example, if a i ire includes an item asking the 
respondent the number of times he visits the Super Bazar in a month and 
he is required to tick one of the categories : 5—10, 10—15, and 15—20, a 
person who visits the Super Bazar 10 times may find it difficult to decide 
whether to put the tick in the space against the class 5—10 or 10—15. 
In the absence of any specific instructions, some people may tick the class 
5—10 while others 10—15. Hence whenever this method is used it 
is necessary to give clear instructions in the questionnaire. However, the 
reader should note that if class intervals are given like 0—10, 10—20, etc., 
it is always presumed that upper limit is exclusive, і,6., the item of that 
value is not included in that class. 

A better way of expressing the classes when exclusive method is 
followed is : 


Income (Rs.) No. of Persons 
100 but under 200 50 
200 , „ 300 100 
300 , „ 400 200 
400 ,, „ 500 150 
$00 , „ 600 40 
600 , „ 700 10 

Total 550 


It avoids confusion of the type when classes аге expressed. 100—200, 
300, 300—400, etc. It is suggested that in practice this approach 
should be preferred over the previous one. 

(ii) ‘Inclusive’ Method. Under the ‘inclusive’ method of classi- 
fication, the upper limit of one class is included in that class itself. The 
following example illustrates the method : 


Income (Rs.) No. of Persons 
100--199 50 
200—299 100 
300— 399 200 
400— 499 150 
500 — 599 50 

10 
Total 560 


Sern ine е С СУ TUBAE aac 1. CNET 

In the class 100—199 we include persons whose income is between Rs. 100 
and Rs. 199. If the income of a person is exactly Rs. 200 he is included 
in the next class. The above example makes it clear that there is no 
confusion here of the we find under the ‘exclusive’ method. We may. 
have classes like 100—199°5 or 100—199°9, and so on. 


To decide whether to use the inclusive ог. the exclusive method it 
important to determine whether the variable under observation is ai 


m cin 
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continuous or discrete one. In case of continuous variables the upper limit 
exclusive method must be used. For example, the variable height being 
inherently a continuous one should be stated, as 60” and under 69", 62/4 
and under 64", and so оп. The inclusive method should in general be used 
in case of discrete variables. Thus in classifying factories according to. 
number of workers, the limits should be stated as, for example, 100 —199. 
employees, 200—299 employees and not 100 —200, 200 —300, etc. 


Principles of Classification 


It is difficult to lay down any hard and fast rules for classifying the 
data as the type of classificatian depends largely on the nature of the given 
data and the object of classification. 


However, the following general considerations may be borne in mind 
for ensuring meaningful classification of data : 


(1) The number of classes should preferably be between 5 and 15. 
However, there is no rigidity about it. The classes сап be more than 15 
depending upon the total number of items in the series and the details 
required, but thev should not be less than five because in that case the classi- 
fication may not reveal the essential characteristics, The choice of number 
of classes basica]ly depends upon : 


(a) the number of figures to be classified, 

(b) the magnitude of the figures, 

(c) the details required, and 

(d) ease of calculation of further statistical work. 


(2) As far as possible one should avoid such values of class-intervals, 
as 3,7, 11, 26, 39, etc. Preferably, one should have class-intervals of 
either five or multiples of 5 like 10, 20, 25, 100, etc. The reason is that 
the human mind is accustomed more to think in terms of certain multiples 
of 5, 10 and the like. However, where the data necessitate а class-interval 
of less than 5 it can be any value between 1 and 4. 


(3) The starting point, i.e., the lower limit of the first class, should 
either be zero or 5 or multiple of 5. For example, if the lowest value of 
the data is 63 and we have taken a class-interval of 10, then the first class 
should be 60—70, instead of 63—73. Similarly, if the lowest value of the 
data is 76 and the class-interval is 5 then the first class should be 75 to 80 
rather than 76 to 81. 


(4) To ensure continuity and to get correct class-interval we should 
adopt ‘exclusive’ method of classification. However, where ‘inclusive’ 
method has been adopted it is necessary to make an adjustment to deter- 
mine the correct class-interval and to have continuity. The adjustment 
consists of finding the difference between the lower limit of the second class 
and the upper limit of the first class, dividing the difference by two, Sub- 
tracting the value so obtained from all lower limits and adding the value to 
all upper limits. This can be expressed in the form of a formula as follows : 


Correction __ Lower limit of the 2nd class—Upper limit of the Ist class 
factor 2 
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How the adjustment is made when data are given by inclusive 
method can be seen from the following examples : 


Weekly Wages No. of. Weekly Wages No. of 
(in Rs.) Workers (n Rs.) Workers 
10—19 5 40—49 8 
20—29 10 50—59 2 
30—39 15 


To adjust the class limits, we take here the difference between 20 and 
19, which is one. By dividing it by two we get ог 0:5. This (0:5) is 
called the correction factor. Deduct 0:5 from the lower limits of all 
classes and add 0'5 to upper limits. The adjusted classes would then 
be as follows : 


Weekly Wages No. of Weekly Wages No. of 
(in Rs.) Workers (in Rs.) Workers 
9:5— 19:5 5 39:5—49:5 8 
19 5—29:5 10 49: 5—59: 5 2 
29:5—39:5 15 


It should be noted that before adjustment the class-interval was 9 but 
after adjustment, it is 10. Observe another case. 


Variable Frequency 

5— 9:5 8 

10—14:5 ` 10 

15—195 2 
(10—95) 


=0°25. 


The correction factor here is 7 
After adjustment the classes will be : 
Variable Frequency 
475- 975 8 
9775 -1475 10 
1475—1975 2 
The class-interval now is 5 and not 4'5. Taking a third example, if the class limits 
are: 
Variable Frequency 
5— 9:99 8 
10—14:99 10 
15—19:99 2 
The correction factor would be (10-999)..001, 0-005, 
` After adjustment the classes will become : 
Variable Frequency 
4'995— 9:995 8 
3:995—14:995 10 
14:995—19:995 2 


ғ. (5) The intervals should be equal for all е · classes. If intervals 
{are ,not of uniform width, it is difficult to make meaningfal comparisoh 
: between classes. Also unequal class intervals present problems when 
graphing and computing certain averages and other statistical measure 
However, frequency distributions with unequal class intervals are desirable _ 
when there are large gaps in the data. 


"CLASSIFICATION AND TABULATION OF DATA E-5'10 


(6) If possible, open-end: classes of the type below 100, 100—200; 
200—500, above 500 should be avoided. Open-end distribution presents 
problems of graphing and further analysis. When the frequency distribu- 
tionis being employed as the only technique of presentation, open-end 
classes do not seriously reduce its usefulness as long аз only a few items fall 
in these classes. However, use of the distribution for purposes of further 
mathematical computation is difficult because a mid-point value, which can 
фе used to present the class, cannot be determined for an open-end class. 


(7) In any frequency distribution the sizes of items or the values are 
-indicated on the left-hand side and the number of times the items in those 
sizes or values have repeated are indicated by frequencies on the right- 
hand side opposite to the respective sizes or values. 
Illustration 1. The marks scored by 50 students in an examination paper are 
given below : d 
80 3045? 4811/55, 0139, 25 31 12 18... 217. 54^ 89^ SE 
33 97437-4405 101644819; 26. 35 37 4 46 33 
51 37 . 88: а ИТ: IO ЭЗ ү 26-29. 085 ЯТ, 36) 235 
44 43 Zig al 43 22 31 47 34 18 15 
Prepare a frequency table with five class intervals each of width 10 marks. 
(2. Com., Bombay, 1972)- 


“Solution : FREQUENCY DISTRIBUTION OF THE MARKS OF 50 STUDENTS 


Marks Tally Bars Мо. of Students 
10—20 Aur d 8 
20—30 "mam 8 
30—40 зт iut ш 15 
40—50 зт ни 12 
50-60 це п 7 
Total 50 


Ru MP THe pO Pe EE Se ae epe RONDE UT. une МАРНА Сыз 
Illustration 2, Prepare a statistical table from the following : 
Weekly wages of 100 workers (in Rs.) of Factory A 


88 23 21 28 36 96 94 93 36 99 

82 24 24 55 88 99 55 86 82 36 

96 39 26 54 87 100 56 84 83 46 
102 48 27 26 29 100 59 83 84 48 . 
104 46 30 29 40 101 60 89 46 49 . 
106 33 36 30 40 10 70 90 49 50 
104 36 37 40 46 106 72 94 50 60 

24 39 49 46 66 107 76 96 46 67 

26 78 50 44 43 46 79 99 36 68 

29 61 56 99 93 48 80 102 32 51 


(B.A; Hons. Econ., Delhi, 1973) 


Solutign. The lowest value is 23 and the highest 106. The difference in the 
highest ше is 83. If we take a class interval of 10, nine classes would 
be formed. he fire class should be taken as 20—30 instead of 23—33 as per the 


Principles of classification. 


DEMNM = 
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FREQUENCY DISTRIBUTION OF THE WAGES OF 100 WORKERS 


Wages ( Ёз.) Tally Bars Frequency 
20—30 mt ait om 13 
30—40 ar toa H 
40—50 at me mw our 18 
50—60 mr nn 10 
60—70 dura 6 
70—80 aut 5 
cen iur жт uu 15 

aur ун и 
160—110 aur 


Hlustration 3. The weights in gm. of 50 apples picked out at random from a 

consignment are as follows : 

106 107 76 82 109 107 115 93 187 95 123 125 11 
92 $6 70 126 68 130 129 139 19 115 128 100 186 
84 99 113 204 11 141 139 123 90 15 98 10 78 
90 107 81 130 75 85 105 10 80 118 82 


Forma ped juency table by dividing the variate range into intervals of 
equal width, edic peri to 20 gm. in such a way that the mid-value of the first 
class interval is 70 gm. (B.Sc. Agr., H.P., 1974) 

Solution : FREQUENCY TABLE OF THE WEIGHT OF 50 APPLES 


Weight (in gm.) Tally Bars Frequency 


i 
j Hi 
& 
mul Sahu 


| 


Total 


Illustration 4. Present the following data of the Percentage marks of 60 students. 
in the form of a frequency table with 10 classes of equal width one class being 40—49, 


41 17 .89 68. 4 — x «0 38 70 06 67 82 
23 44° 57 49 "34773 УН ЖЧ Ыта СҮ ke ы. CNN 
60 33 09 79 38 30 4. 93 43 580 03 22 
$T 05 24: 54-65 3] Е 40! 23 5500 4l 
€ 3 2 55 9 8 609 55 6 .39 40 57 
„гё (B. Com., Andhra, 1974) 
Solution : FREQUENCY DISTRIBUTION OF THE MARKS OF 60 STUDEN TS 
Marks Tally Bars Frequency 
0—9 m 
10—19 m 3 
20—29 ni 3 
30—39 жш 10 
40—49 ион 7 
50—59 aam 9 
60—69 шош! 11 
70—79 an 5 
80— ast 5 
90—99 " 3 
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Illustration S. The data given below relate to the heights and weights of 
20 persons. You are required to form a two-way frequency table with class interval 
62" to 64”, 64” to 66”, and so on and 115 to 125 İb., 125 to 135 Ib., etc. 


S. No. Weight Height S. No. Weight Height 
1 170 70 n 163 70 
135 65 12 139 

3 136 65 13 122 63 

4 137 64 14 134 68 

5 148 69 15 140 67 

6 124 63 16 132 69 

1 117 65 17 120 66 

8 128 70 18 148 68 

9 143 7A 19 129 67 

10 129 62 20 152 67 
Using standard deviation and its coefficient, state whether [there is greater 
variation in height or weight.* (C.A., May, 1969) 


Solution. As per the requirements of the question, the population is te be 
divided into five classes according to the heights of the persons included in each group 
and six classes according to the weight. There will be thus 5x6-30 cells. 


For tabulating the information in appropriate cells, first, the row to which the 
height measurement (say, X) should belong is determined. Afterwards on а considera- 
tion of the weight (say, Y), the column in which it should be included is determined, ` 
The tabulation is recorded by tally bars, Thus the two-way table shall be prepared 
like this :f 

TWO-WAY FREQUENCY TABLE SHOWING WEIGHT AND 
HEIGHT OF 20 PERSONS i 


NS weight | 
Nn Ib. (Y) 
PN 115—125 | 125—135 | 135—145 |145—155,155—165| 65—175| Total 
Height ^N 
їп їп. (Х) > 
N 5 
zwei vf" TO) Lun] boe 3. 
64—66 а Иа РҮ ш (3) Pinhal aa ai 
А ШО етет Е а EF 
68—70 рое OD posee t 10) 4 
70—72 ТОШ БЕ s T peque. 
таар TS] CNET RU SN ТАГ Е 720. 
TABULATION 


One of the simplest and most revealing devices for summarizing 
data and presenting them in a meaningful fashion is the statistical table. 
A table is a systematic arrangement of statistical data in columns and rows. 


* See Chapter on Measures of Variation. 
+ The figures in brackets denote the frequency corresponding to each cell. 
SME—10°77-6 E 
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Rows are horizontal arrangement, whereas columns are vertical. ones, 
The purpose of a table is to simplify the presentation and to facilitate 
comparisons. The simplification results from the clear-cut and systematic 
arrangement, which enables the reader to quickly locate desired informa- 
tion. pua person is facilitated bringing related items of information close 
together. 


Difference between Classification and Tabulation 


The chapter heading ‘Classification and Tabulation’ should not lead 
the reader to believe tliat these are two distinct processes. In fact, they 
go together, classification being the first step in tabulation. Before the 
data are put in tabular form they have to be classified, i.e., the different 
items having common characteristics must be brought together. It is 
only after this step that the data are displayed under different columns 
and rows so that their relationship can be easily understood. 


Role of Tabulaticn 


Tables make it possible for the analyst to present a huge mass of 
data in a detailed orderly manner within a minimum of space. Because 
of this, tabular presentation is the cornerstone of statistical reporting. 


necessary details and repetitions are avoided. Data are presented systemati- 
cally in columns and rows. Hence, the reader gets a very clear idea of | 
what the table represents. There is thus a considerable saving in time 
taken in understanding what is represented by the data and all confusion 
is avoided, Also a large amount of space is saved because of non- 
duplication of headings and designations : the description at the top of a 
column serves for all the items beneath it. 


; 2. It fagilitates comparison. Tabulation facilitates comparison. Since 
a table is divided into various parts and for each part there are totals and 
sub-totals, the relationship between different parts of data can be studied 
much more easily with the help of a table than without it. 


3. It gives an identity to the data. When the data are arranged in 
a table with a title and a number they can be dis:inctly identified and can 
be used as a source reference in the interpretation of a problem. 


4. ltreveals pattzrns. Tabulation reveals patterns within the figures 
which cannot be seen in the narrative form. It also facilitates the 
summation of the figures if the reader desires to check the totals. 


Parts of a Table - 
The number of parts of a table varies from case to case depending. 


Грол the given data. However, the main parts of a table in general аге : 
ithe followings : 


* “А statistical table is the gical listing of related quantitative data in vertical 
columns and horizontal rows of numbers with sufficient explanatory and qualifying 
words, phrases and statements in the form of titles, headings and notes to make clear | 


the full meaning of data and their origin." ke eae 


“A table summarizes the data by using columns and rows and entering figures 
in the body of table.” X —Karmel | 


N t 
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1, Table number 5. Body of the table 
2. Title of the table 6. Headnote 

3. Caption 7. Footnote. 

4. Stub 


1. Table Number. Each table should be numbered. There are 
different practices with regard to the place where this number is to be 
given. The number may be given either in the centre at the top above 
the title or in the side of the title at the top or in the bottom of the table 
on the left-hand side. However, if space permits the table number should 
‘be given in the centre аз is shown in the specimen table given on page 
E-5:16. Where there аге many columns, it is also desirable to number cach 
column so that easy reference to it is possible. 


2. Title of the Table. Every table must be given a suitable title. 
The title is a description of the contents of the table. A complete title 
has to answer the questions what, where and when in that sequence. Im 
other words : 


(a) What precisely are the data in the table (i.e., what categories ef 
statistical data are shown) ? 

(b) Where the data occurred (i.e., the precise geographical, political 
or physical area covered) ? 

(c) When the data occurred (3.e., the specific time or period covered 
by the statistical material in the table) ? 

The title should be clear, brief and self-explanatory. However, 
clarity should not be sacrificed for the sake of brevity. Long titles cannot 
be read as promptly as short titles, but at times they may have to be 
used for the sake of clarity. The title should be so worded that it permits 
one and only one interpretation. It thould be in the form of a series of 
phrases rather than complete sentences. Its lettering should be the most 
prominent of any lettering on the table. 


3. Caption. Captionrefersto the column headings. It explains 
what the column represents. It may consist of one or more column head- 
ings. Undera column heading there may be sub-heads. The caption 
should be clearly defined and placed at the middle of the column. If the 
different columns are expressed in different units, the units should be 
mentioned with the captions. As compared with the main part of the 
table the caption should be shown in smaller letters. This helps in 
saving space. 

4. Stub. Аз distinguished from caption, stubs are the designations 
of the rows or row headings. They are at the extreme left апа perform the 
same function for the horizontal rows of numbers in the table as the 
column headings do for the vertical columns of numbers. The stubs are 
usually wider {һап column headings but should be kept as narrow as 
possible without sacrificing precision and clarity of statements. 

5. Body. The body of the table contains the numerical infor- 
mation. This is the most vital part of the table. Data presented in the 
body arranged according to descriptions are classifications of the captions 
and stubs. 

6. Headnote. It іѕ а brief explanatory statement applying to all 
or a major part of the material in the table, and is placed below the title 
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centered and enclosed in brackets. It is used to explain certain points 
relating to the whole table that have not been included in the title nor in 
the captions or stubs. For example, the unit of measurement is frequently 
written as a headnote, such as “in thousands” or “in million tonnes”, or 
“in crores”, etc. 


7. Footnotes. Anything in a table which the reader may find 
difficult to understand from the title, captions and stubs should be ex- 
| Conia in footnotes. If footnotes are needed they are placed directly 

elow the body of the table. Footnotes are used for four main purposes : 

(a) To point out any exceptions as to the basis of arriving at the 
data, for example, sales recorded at ‘ex-factory price’ for some of the 
entries and at ‘delivered price’ for others. Any heterogeneity in the data 
recorded must be disclosed to avoid wrong conclusions. 

(6) Any special circumstances affecting the data, for example, strike, 
lock-out, fire, etc. 

(c) To clarify anything in the table. 

(d) To give the source in case of secondary data. The reference to 
the source should be complete in itself, for example, if the data is obtained 
from some periodical, its name, date of publication, page number, table 
number, etc., should be mentioned so that if the user wishes to check the 
data from the original source or considers later data from the same Source, 
he will know where to look for the information. 


There аге various systems of identifying the footnotes, One is 
numbering them consecutively with small numbers 1, 2,5, * or letters 
а, b,c, d. Another system identifies the first footnote with one star (*), 
second footnote with two stars (**), third footnote with three stars (***), 
and so on. Sometimes instead of stars another sign f is used. However, 
where several footnotes are required, itis more convenient to use small 


The following is a specimen of a table indicating the above parts : 


Title Headnote 
e —Caption————— -— 
| ——— — Stub ——- —— 5 4. —— ———— 
Heading- >= *—Column  heading«--» Column heading— > 
t 
| 
Stub Body 
i ix | 
| | 
| 
| 
+ 
Footnotes Table Number Source 


Format of a. Table 
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General Rules of Tabulation 


It is difficult to lay down any hard and fast rules for tabulating data 
because much depends upon the given data and requirements of the 
survey. In fact, constructing a good table is an art and, therefore, practi- 
cal experience is of immense help. Prof. Bowley rightly points out: “In 
collection and tabulation common sense is the chief requisite. and experience 
the chief teacher.” However, the following general considerations may be 
kept in view while tabulating data : 


l. The table should suit the size of the paper usually with more 
rows than columns. In making a suitable layout it may be necessary to 
alter the original design. The alteration often consists in changing the 
rows to columns or the other way round. For this reason it is desirable 
to make a rough draft of the table before the figures are entered in it. 
Space must be allowed for reference or any other matter which is to be 
included in the table. 


2. In all tables the captions and stubs should be arranged in some 
systematic order. It would make the table easier to read and allow more 
important item: to be emphasized. The arrangement of item: basically 
depends upon the type of data. However, the principal bases for arranging 
items are the following : 

(a) Alphabetical, i.e., arrangement according to alphabets. The 
type of arrangement is every common in general purpose or reference 
tables. : 

(b) Chronological, i.e., arrangement according to tims. This type 
of arrangement is of particular value in presenting historical data. 

(c) Geographical, i.e., arrangement of data in certain territorial units 
such as countries, cities, districts, etc. 

(d) Conventional, or arrangement in a customary order such as men, 
women and children or Hindus, Muslims, Sikhs and Christians, 

{ (e) Items may be arranged according to size, 44, the numerical 
importance of the items, the largest items being given first and the smallest 
in the last. This arrangement may be reversed, if necessary. 


3. The unit of measurement should be clearly defined and given im 
the table, such as income in rupees or weight in pounds, etc. P 

4. Figures should be rounded to avoid unnecessary details in the 
table and a footnote to this effect should be given. For example, the figures 
may bc taken to the ncarest rupee and the paise be eliminated. 


5. If certain figures are to be emphasized they should be in distinc- 
tive type or in a *box' or circle or between thick lines. 

6. The table should not be overloaded with details. If many 
characteristics are to be shown it is not necessary to load them all in one 
table ; rather a number of tables should be prepared, cach table complete 
in itself and scrving a particular purpose. 

7. A column entitled *miscellaneous column' should be added for 
data which do not fit in the classification made. А 

8. The ent of the table should be logical and items related 
to each other should be placed near about and, if possible, in the same 
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group. Derivative figures such as totals, averages and percentages should 
be placed near the original figures. Columns and rows should be numbered 
for identification since reference is more easily given by quoting numbers 
than the title of the column. 

9, Percentages and ratios should be computed and shown, if neces- 
sary. Frequently, figures in tables become more meaningful if they are 
expressed as percentages or (less often) as ratios, In constructing a table, 
therefore, it 1s important to decide whether or not it сап be improved in 
this way. Ifit can, additional column should be inserted in the table and 
the percentages (or ratios) computed and entered. Such percentages and 
ratios are sometimes called derived statistics. 

10. Where standard classifications have been prepared it is usually 
desirable to employ them, as they are superior to hastily constructed 
individual classifications. 

ll. Indicate a zero quantity by a zero, and do not use zero to 
indicate that information which is not available. If it is not available, 
show this fact by the letters N.A. or by dash (—). 
= 12. Abbreviations should be avoided especially in titles and headings. 
For example “уг. should not be used for “year”, 

13. Be explicit. The expression "etc." is bad form in a table, since 
the reader may not readily discover what it refers to. In fact clarity is 
the most important feature of tabular presentation of any kind of statistical 
data. 

14. Do not use ditto marks, Ifa figure is repeated, show it each 
time. A ditto mark may be mistaken for the figure “11”. 


All these guidelines may be difficult to follow in a particular case. 
But their purpose should be kept in mind. J.C. Capt has beautifully 
summarized this discussion in these words : “In the fina) analysis there 
are only two rules in tabular presentation that should be applied rigidly. 
First, the use of common sense when planning a table, and second the 
viewing of the proposed table from the standpoint of the user. The 
details of mechanical arrangement must be governed by a single objec- 
tive, that is, to make the statistical table as easy to read and to understand 
as the nature of the material will permit.” 
Review of the Table 

Before a table is released it should be reviewed for form, content, 
validity and clerical accuracy. It is difficult for the person preparing the 
table to make a thoroughly satisfactory check on all of the four aspects. 
He has prepared the table and has done his best, he can hardly review it 
objectively now. He should, if possible, have his work reviewed by some- 
one with experience 
à In case of a summary table the reviewer should ask himself the 
following questions to determine whether or not the table is satisfactory : 

(1) Poes the title clearly state what is in the table ? 

(2) Arc all the entries pertinent ? 

(3) Is there unity of subject-matter ? 

(4) Are the classifications arranged so as to focus attention ол, the 
main comparison ? 

(5) Are the data arranged so as to emphasise important points 7 

(6) Does the table include adequate interpretative figures such as 
totals, percentages and averages ? 
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(7) Are there notations about peculiarities of the data ? 

(8) Is the source stated properly ? 

(9) Is the tabie in torm, so that it presents an attractive appear- 
ance ?* ! 
Types of Tables 

'Tables may be broadly classified into two categories : 


1. Simple and-compiex tables. 
2. General purpose and special purpose (or summary) tables. 


1. Simple and Complex Tables. The distinction between simple 
and complex tables is based upon the number of characteristics studied. 


In a simple table only one characteristic is shown. Hence, this type 
of table is also known as one-way table. In a complex table, on the other 
hard, two or more characteristics are shown. Such tables are raore 
popular in practice because they enable full information to be incorporated 
and facilitate a proper consideration of all related facts. When two 
characteristics are shown such a table is known as two-way table ог double 
tabulation. When three characteristics are shown in a table, this type of 
tabulation is known as treble tabulation. When four or more characteristics 
are simultaneously snown it is а case of manifold tabulation. The 
following examples will illustrate the distinction between simple and 
complex tables : 


(i) Simple table or one-way table. In this type of table only one 
characteristic is showa. This is the simplest type of table. The following 
js the illustration of such a table : 


NUMBER OF EMPLOYEES IN STATE BANK ACCORDING TO AGE GROUP 


Age (in years) No. of Employees 


Below 2> 
25—35 
35—45 
45—55 

Above 55 


Total 
(ii) Two-way table. Such a table shows two characteristics and is 
formed when either the stub or the caption is divided into two co-ordinate 
parts. The following example illustrates the nature of such a table : 


NUMBER OF EMPLOYEES OF STATE BANK IN DIFFERENT 
AGE-GROUPS ACCORDING TO SEX 


Employees s 

ü iir d ———— — -— -c ‘ot 

Age Un zeara) Males Females 
777 Below 25 i 
25—35 T 
35—45 ^ E 
45—55 Ў see 
Above 55 а Ee ааа RT Bee pes ah 

Total . 


*Spurr and Smith — Business and Economic Statistics. 
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(iii) Higher order table. When three or more characteristics are | 
represented in the same table, such а table is called higher order table. 
The need for such a table arises when we are interested in presenting a 
number of characteristics simultaneously. While constructing such a 


It should be remembered that as the number of characteristics 
represented increases, the table becomes more and more confusing and as 
such normally not more than four characteristics should be represented in 
the same table, Where more than four characteristics are to be represented 
we can have more than one table depicting relationship between the 
attributes. In the following two illustrations, three and four characteristics 
are represented respectively. 


Three characteristics : 


NUMBER OF EMPLOYEES OF STATE BANK 
ACCORDING TO AGE-GROUPS, SEX AND RANKS 


Ranks 


Supervisors | Assistants 


Age 
in years 


I———— —- —— —— ——————— 
Mj F |Total'! M | F | Total| M| F , Total | M | F 


Below 25 
25—35 
35—45 
45-55 

Above 55 


Total A NCC Am Ie ZH 2: 
Note. M indicates males and F females. 
Four Characteristics : 


NUMBER OF EMPLOYEES OF STATE BANK ACCORDING TO 
RELIGION, AGE, RANK AND SEX i 


Religion 


Hindus | Below 25 |... 
25—35 
35—45 
45—55 | ... 
55 & Above 


-Total 


Muslims | Below 25 
25—35 

35—45 

45—55 |... 

55 & Above | .. 


To tal 


-L 
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. Notesi. The table can be extended to show other religions such as Sikhs, 
Christians, etc. z 


2. M indicates males and F indicates females. 


2. General and Special Purpose Tables. Gencral purpose ў 
tables also known as the reference tables or repository tables provide infor- 
mation for general use or reference. They usually contain detailed informa- 
tion and are not constructed for specific discussion. In other words, these 
tables serve as a repository of information and are arranged for casy refer- 
ence. Tables published by the governmental agencies are mostly of this 
kind, such as the tables contained in the Statistical Abstract of the Indian 
Union, detailed tables contained in the census reports, etc. Such tables 
tell facts which are not for particular discussion. When such tables are 
used by a researcher, they are usually placed in the appendix of the report 
for easy reference. 


Special purpose tables, also known as summary tables, provide infor- 
mation for particular discussion. When attached to a report they are 
found in the body of the text. These tables are also called derivative 
tables since they are often derived from general tables. Thus the large 
detailed tables in the census records of the Government.of India are general 
purpose tables. When such data are used, they are ordinarily taken from 
the general purpose tables and presented as special purpose tables, which 
emphasize the relation the user wishes to stress, А special purpose table 
should be designed in such a way that the reader may easily refer to the 
table for comparison, analysis or emphasis concerning the particular 
discussion. The tables used in this chapter are of this kind. 


д Illustration 6. Point out the mistakes іп the following table drawn to show the 
distribution of population, according to sex, age and literacy : 


Sex 0 to 25 25 to 50 50 to 75 15 to 100 


ee | 


(В. Com., Lucknow, 1971) 


Solution, All the characteristics are not revealed in the above table; the 
characteristic of literacy has been completely ignored. Even otherwise the table needs 
to be re-arranged as follows : i 


TABLE SHOWING THE DISTRIBUTION OF POPULATION 
ACCORDING TO AGE, SEX AND LITERACY 


Literates Jlliterates 


Age group —- 


0 to 25 
25 to 50 ae 
50 to 75 os 
75 to 100 


—— — 


Total 
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Illustration 7. Draft a blank table to show the distribution of personnel in a 
manufacturing concern according to : 
(a) Sex : males and females. 
(6) Three grades of salary : below Rs, 300, Rs. 300—500, Rs. 500 and over. 
(c) Two periods : 1975 and 1976. 
(d) Three age groups : below 25, 25 and under 40, 40 and over. 
Solution: TABLE SHOWING DISTRIBUTION OF PERSONNEL ACCORDING 
TO SEX, SALARY AND AGE-GROUPS FOR TWO YEARS 
Saiary Grade 


М | | Total 


| 25 and 
1976 | under 40 
| 40 an 


Illustration 8, Present the following information in a suitable tabular form. 


In 1960 out of a total of 1,750 workers of а factory, 1,200 were members of a 
trade union. 

The number of women employed was 200, of which 175 did not belong to a 
trade union, In 1965 the number of union workers increased to 1,580, of which 1,290 


were men. On the other hand, the number of non-union workers fell down to 208, fof 
Which 180 were men. 


х In 1970, there were 1,300 employees who belonged to a trade union and 50 who 
did not belong to a trade union. Of all the employees in 1970, 300 were women of whom 
only 8 did not belong to a trade union. 


Solution: TABLE SHOWING THE SEX-WISE DISTRIBUTION OF UNION 
AND NON-UNION MEMBBRS FOR 1960, 1965 AND 1970 


1960 
Category тпк casa oe ERS 
M F Total Total| M pam 
Меде 1475| 25 | 1,200 | 1,290] 290 | 1,580 | 1,508| 292| 1,800 
Non-members 375| 175 550 180 28 208 42 8 50 
Total _ [1,550] 200 | 1,750} 1,470) 318 | 1,788! 1,550! 300 | «1,850 — 


M- Males F= Females 
MISCELLANEOUS ILLUSTRATIONS 
Illustration 9. Ina trip organized by a college there were 80 persons each of 
whom paid Rs. 15:50 on an average. There were 60 st:lents each or whom paid 
Rs.16. Members of the teaching staff were charged ata higher rate. The number of 
servants was 6 (а!) males) and they weré not charged anything. The number of ladies 
was 20%, of the total of which one was a lady staff member, 


Tabulate the above information. (В. Com., Bombay, 1973 


y 
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Solution : TABLE SHOWING THE TYPE OF PARTICIPANT, 
SEX AND CONTRIBUTION MADE 


Sex ! Contribution ПЛ 
Type of ee А per member Contribution 
Participants Males | Females | Total (Rs. Rs.) 
Students CA bass | 5190 960 — 
Teaching Staff 13 1 14 2000 260 
_ NE Se 4 а 
Servants 6 — 6 — — 
_———— а | MMe] -———1— ——— — |a ————À— 
Total 64 16 ga! a 1.240 
Note 1. Total contribution = Average contribution x No. of persons who 
joined the trip- 


—15:5x 80—1,240 


2. Contribution of the staff per head has been obtained by deducting 
the contribution of students from the total and dividing the difference 
by the number of teaching staff. i.e., 
1240 — (60 x 16) 1240—960 280 
AAOS [0 7 
14 Sin at Vegi 


Illustration 10. In certain data, the following four main characteristics with their 
sub-characteristics are present : 


Main Characteristics Sub Characteristics 
Locality Urban, Rural 
Religion Hindu, Non-Hindu 
Sex Male, Female 
Age 0—30, 30—60, 60 and Over 
Prepare а suitable form of table. (B. Com , Poona, 1973) 
i 


Solution. Please see page E 5:23. 

Jinstration 11. Draw a blank table to. present the information regarding the 
college students according to : 

(a) Faculty—Art, Commerce, Science. 

(b) Class—Degree and Pre-University Class. 

(c) Sex—Male and Female. 

(d) Age—below 20, above 20, 

(e) For 2 years—1970 and 1975. (B Com., Marathwada 1975) 


Solution. Please see page E-5:23. 


Jilustration 12. In a sample study about the tea habits in two villages the follow- 
ing data were observed : 


Village А Village B 
70%, persons were males 55 % persons were males 
80% were tea drinkers, and 35%, were tea drinkers, and 
62% were male tea drinkers 25%, were male tea drinkers 


Tabulate the above data. , 
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Solution 
TABLE SHOWING PERCENTAGE OF TEA DRINKERS IN VILLAGE A&B 
Village B 

Attributes an E P. ESQ ПЕШЕНЕСИН EC EE M ЕЩ 
Females, Total 

Tea Drinkers 19 | 35 

Non-Tea Drinkers 35 65 

Total 45 | 100 


Nete : Bold figures have been derived on the basis of information given. 

Illustration 13. Out of a total number of 1,807 women who were interviewed 
for employment in a textile factory of Bombay, 512 were from textile areas and the rest 
from the non-textile areas. Amongst the married women who belonged to textile areas, 
247 were experienced and 73 inexperienced, while for non-textile areas, the correspond- 
ing figures were 49 and 520. The total number of inexperienced women was 1,341 of 
whom 111 resided in textile areas. Of the total number of women 918 were unmarried, 
and of thzse the number of experienced women in the textile and non-textile areas was 
154 and 16 respectively. Tabulate. ‚ 
(В. Com., Delhi, 1972) 
Solution: TABLE SHOWING THE MARITAL STATUS OF 1,807 WOMEN 

RESIDING IN TEXTILE AND NON-TEXTILE AREAS 


Textile Areas | Non-Lextile Areas Total 


M | U | Total M | U | Тоа M | U | Total 


Experienced 247 | 154 |401 | 49 | 16 | 65| 296 | 170 | 466 
Inexperienced 73 38 | lll | 520 710 |1,230| 593 | 748 | 1,34I 
Total 320 | 192 | 512 | 569 | 726 [1,2951 889 | 918 | 1,807 


M indicates Married and U Unmarried. 


Illustration 14. Following figures give the ages of newly married husbands and 
their wives in years. Represent the data by a frequency distribution. 


Age of husband Age of wife Age of husband Age of wife 

24 17 25 17 

26 18 26 18 

27 19 27 19 

25 17 25 19 

28 20 27 20 

24 18 26 19 

27 18 25 17 

28 19 26 20 

25 18 26 17 

26 19 26 18 

(B. Com., Delhi, 1975)| 
Solution : 
FREQUENCY DISTRIBUTION OF THE AGE OF HUSBANDS AND WIVES 
Age of wives Age о] husbands Total 
24 25 26 27 28 

17 | (qu 3) | i (1) ee Sa 5 
18 ja etis [7] qr|m 0|: ai pc nel E 
19 — (1) | 2) | 1 TR ТЕТ 6 
20 = = | ü) |! () ПЕЛЕ) 3 
Total 2 Б 7 4 2 20 


Note. The figures in brackets indicate the frequencies of the respective cells. 
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Machine Tabulation. This method of tabulating information is 
generally used in case of extensive large-scale surveys. Mechanical sorting 
and tabulating equipments and supplies are furnished by two concerns: 
The International Business Machines Corporation, and Remington Rand. 
Each sec of equipment consists of one or more key punches, a sorting 
machine and a tabulating machine. 


Mechanical sorting and tabulation are done with the help of cards 
known as ‘punch card’, *tabulating card’ or ‘code card’. The data from 
the field forms are first recorded on these cards by running them through a 
‘punching machine’, which punches holes in the card in certain positions 
determined by a pre-arranged code. (The position of a hole in the card 
represents a figure). Other machines then sort and count the cards and print 
or otherwise record the results as well as check the work for consistency and 
accuracy. With the help of these machines an amazing amount and variety 
of data can be tabulated within a very short time. [hese machines аге 
constantly being improved upon to make them more rapid, accurate, 
versatile and automatic. 


Advantages of mechanical tabulation : 

(i) Data can be tabulated іп a very short time. 

(ii) Extensive and large-scale surveys can be conveniently handled. 
(iii) Greater accuracy of the results is ensured. 


(iv) Uninteresting and monotonous work is transferred to machines. 
"There is very little human labour involved. 


(v) There is considerable reduction in costs. 


The work of the statisticians can benefit greatly from the present and 
anticipated improvements in data processing. It has been estimated that a 
business statistics report, which required 1.800 man-hours to prepare on the 
basis of manual tabulation, required 100 hours with punched card equip- 
ment and only 12 hours оп а medium sized computer. Many complex 
procedures, such as the fitting of complex trend lines and seasonal adjust- 
ments which were prohibitive in terms of time and cost, can now be 
carried out in a few moments of computer time. 
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6 Diagrammatic and 
Graphic Presentation 


Ín the previous chapter we have discussed the techniques of 
classification and tabulation that help in summarizing the collected data 
and presenting them in a systematic manner. However, these forms of 
presentation do not always prove to be interesting to the common man. 
Тоо many figures are often confusing and may fail to convey the message 
effectively to those for whorn they are meant. 


One of the mest convincing and appealing ways in which statistical 
results may be presented is through diagrams arid graphs, Evidence of 
this сап be found in the financial pages of newspapers and political 
journals, advertisernents, etc. There are numerous ways in which statistical 
data may be displayed pictorially such as different types of diagrams, 
graphs and maps. Very often the problem is that of selecting the best out 
of several that may be available. This isa difficult task and requires a 
great deal of artistic talent and imagination on the part of the individual 
or agency engaged in the preparation of diagrams and graphs. It is not 
practicable to discuss all possible forms of charts here. An attempt is made 
in this chapter to illustrate some of the major types of diagrams, graphs 
and maps frequently wed in presenting statistical data. 


Significance of Diagrams and Graphs 


Diagrams and graphs are extremely useful because of the following 
reasons : 


1. They give a bird's-eye view of the entire data and, therefore, the 
information presented is easily understood. It isa fact that as the number 
and magnitude of figures increases they become more confusing and their 
analysis tends to be more strenuous. Pictorial presentation helps in proper 
understanding of the data as it gives an interesting form to the data. The 
old saying ‘A picture is. worth 10,000 words’ is very true. he mind 
through the eye can more readily appreciate the ‘significance of figures in 
the form of pictures than it can follow the figures themselves. 


2. They are attractive to the eye. Figures are dry but diagrams 
delight the eye. For this reason diagrams create greater interest than cold 
figures, Thus while going through journals and newspapers, the readers 
generally skip over the figures but most of them do look at the diagrams 
and graphs. Since diagrams have attraction value, they are very popular 
in exhibitions, fairs, conferencés, board meetings and public functions. 


3. They have a great memorizing effect. The impressions created 
by diagrams last much longer than those created by the figures presented 
in a tabular form. 


4. They facilitate comparison of data relating to different periods of 
time or different regions. Diagrams help one in making quick and accurate 
comparison of data. They bring out hidden facts and relationships and 
can stimulate as well as aid analytical thinking and investigation. 
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Comparison of Tabular and Diagrammatic Presentation 

Data may be presented in the form of tables as well as diagrams and 
graphs. Both forms of presentation have their own usefulness for particular 
purposes. Hence, the choice of the form of presentation must be made 
with due thought and care. The following points may be kept in view in 
this connection : 

1. Tables contain precise figures whereas diagrams give only an 
approximate idea. Exact values can be read from a table. 

2. More information can be presented in one table than either in 
one graph or diagram. 

$. Tables usually require much closer reading and are more difficult 
to interpret than diagrams, ў 
, 4. Graphs and diagrams have a visual appeal and, therefore; prove 
to be more impressive to laymen. 

5. Charts have the advantage of showing trends and comparisons 
more vividly than the abstract figures in tables. More people are visual 
minded and prefer graphs to figures. 

~ Often the aim isto attract the attention and arouse the interest of 
readers as well as to give precise information. Since such information 
cannot be-obtained from a graph, one may employ both tabular} and 
graphic-means of presentation. 


Difference between Diagrams and Graphs t 

The question is how to distinguish a diagram from a graph. "Though 
there is no clear-cut line of demarcation between the two, yet following 
points'of difference may be noted : А 

1. For constructing a graph we generally make use of graph paper 
whereas a diagram is generally constructed on a plain paper. In other 
words, a graph represents mathematical relationship (though not necessarily 
functional) between two variables whereas a diagram docs not. 

2. Diagrams are more attractive to the eye and as such are better 
suited for publicity and propaganda. They do not add anything to. 
the meaning of the data and, therefore, from the point of view of a 
statistician or research worker they are not helpful in analysis. Graphs, on 
the other hand, are very much used by the statistician and the research 
worker in analysis. In fact, these days it is difficult to find any research 


work without the graphic support. 

3. For representing frequency distributions and time series, graphs 
are more appropriate than diagrams. In fact, for presenting frequency 
distributions diagrams are rarely used. 


General Rules for Constructing Diagrams 

The following general rules should be observed while constructing 
diagrams : : - 

1. Title. Every diagram must be given a suitable title. . The эше 
should convey in as few a words as possible the main idea that the 
diagrams intend to portray. However, the brevity should not be secured 
at the cost of clarity or omission of essential details; The titiernay be 
given cither at the top of the diagram or below it. * 


sf 
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2. Proportion between width and height. A proper proportion 
between the height and width of the diagram should be maintained. If 
either the height or width is too short or too long in proportion, the 
diagram would give an ugly look. While there are no fixed rules about 
the dimensions, a convenient standard as suggested by Lutz in the book 
entitled Graphic Presentation may be adopted for general use. It is known 
as *Root-two", that is, a ratio of 1 (short side) to 1°414 (long side). 
Modifications may, no doubt, be made to accommodate a diagram in the 
space available. 

3, Selection of scale. The scale showing the values should be in 
even numbers or in multiples of five or ten, ©.ў., 25, 50, 75 or 20, 40, 60. 
Odd values like 1, 3, 5, 7, should be avoided Again no rigid rules can be 
laid down about the number of rulings on the amount scale, but ordinarily 
it should not exceed five The scale should also specify the size of the unit 
and what it represents, for example, “millions of tonnes”, ‘number of 
persons in thousands”, ‘units produced in lakhs”, etc. All lettering should 
be easily readable without turning the chart sidewise. 

4. Footnote. In order to clarify certain points about the diagram, 
footnote may be given at the bottom of the diagram. 

5. Index. An index illustrating differeut types of lines or different 
shades, colours, should be given so that the reader can easily make out the 
meaning of the diagram. 

6. Neatness and cleanliness. Diagrams should be absolutely neat 
and clean. 

7. Simplicity. Diagrams should be as simple as possible so that the 
reader can understand their meaning clearly and easily. For the sake ot 
simplicity, it is important that too much material should not be loaded in 
a single diagram otherwise it may become too confusing and prove worth- 
less. Several simple charts are much better and more effective than one 
or two complex ones which present the same material in a confusing way. 

TYPES OF DIAGRAMS 


In practice z very large variety of diagrams are in use and new ones 
are constantly being added. It would be outside the scope of this book to 
deal exhaustively with the subject and as such only more frequently used 

jagrams are discussed. For the sake ofconvenience and simplicity they 
may be divided under the following heads : 
I. One-dimensional diagrams, e.g., bar diagrams. 

II. ‘Two-dimensional diagrams, ¢.g., ractangles, squares and circles. 

III. Three-dimensional diagrams, e.g., cubes, cylinders and spheres. 

IV. Pictograms and cartograms. 

Each of these types is discussed in detail in the following pages. 


I. One-dimensional or Bar Diagrams 
Bar diagrams are the most common type of diagrams used in 
practice. Афаг is a thick line whose width is shown merely for attention. 
They, are called: one dimensional because it is only the length of the bar 
- that matters and not the width. When the number of items is large, lines 
may be drawn instead of bars to economise space. The special merits of 


bar diagrams are the following : 
SME--10'77°7 
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(i) They are readily understood even by those unaccustomed to 
reading charts or those who are not chart-minded. 

(ii) They possess the outstanding advantage that they are the 
simplest and the easiest to make. 

(iii) When a large number of items are to be compared they are 
the only form'that can be used effectively. 

While constructing bar diagrams the following points should be kept 
in mind : T 

(i) The width of the bars should be uniform throughout the 
diagram. 

(ii) The gap between one bar and another should be uniform 
throughout. 

(iit) Bars may be either horizontal or vertical. The vertical bars 
should be preferred because they give a better look and also facilitate 
comparison. 

(iv) While constructing the bar diagram, it is desirable to write the 
respective figure at the end of each bar sot hat the reader can know the 
precise value without looking at the scale. This is particularly so where 
the scale is too narrow, for example, 1” on paper may represent 10 crore 
people. The two diagrams blow would clarify the difference. 


SALES OF FIRM A 


SALES OF FIRM A 
120, (IN LAKHS OF RUPEES) 
2-5 


(IN LAKHS OF RUPEES) 


From the first diagram one can easily make out the exact sales but from (he 
secondiit is difficult, 


Types of Bar Diagram : 
Bar diagrams are of the following types : 3 
(a) Simple bar diagrams (b) Sub-divided bar diagrams* 


*Such diagrams arc also known as component bar diagrams, 


` 
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(c) Multiple bar diagrams (d) Percentage bar diagrams 
(e) Deviation bars. 


(a) Simple Bar Diagrams : 

A simple bar diagram is used to represent only one variable. For 
example, the figures of sales, production, population, etc., for various years 
imay be shown by m*ans of a simple bar diagram. Since the bars are of 
the same width and only the length varies, it becomes very easy for the 
reader to study the relationship. Simple bar diagrams are very popular in 
practice. However, an important limitation of such diagrams.is that they 
can present only one classification or one category of data. For example, 
while presenting the population for the last five decades, one can only 
depict the total population in the simple bar diagrams, and not its sex-wise 
distribution. 

Illustration 1. Following table gives the birth rate per thousand of different 
countries over a certain period 


Country Birth Rate Country Birth Rate 

India 3 China 40 

Germany 16 Newzealand 30 

U.K. 20 Sweden 15 

Represent the above data ! by a suitable diagram. (B. Com., Poona, 1969) 


Solution : 


NEWZEALAND 


| SWEDEN j 


GERMANY 


(b) Sub-divided Bar Diagrams 

For example, the number of students ia various courses Jike B. Com., 
M. Сот B.A., M.A., in various colleges may be represented by a sub- 
divided bar diagram. Waile constructing such a diagram, the various 
components in each bar should be kept in the same order. А common 
and helpful arrangement is that of presenting each bar in the order of 
magnitude from the largest component at the base of the bar to the smallest 
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at the end. ` To distinguish between the different components, it is useful 
to use different shades or colours. Index or key should be given explain- 
ing these differences. 

‚ Sub-divided bar diagrams should not be used where the. number of 
components is more than 10 or 12, for in that case, the diagram will be 
overloaded with information which cannot be easily compared and 
understood, 

The component bar diagram can be used to represent either the 
absolute data or distribution ratios such as percentage distributions. It is, 
in fact, an excellent method for presenting a sect of distribution ratios 
diagrammatically.* 

Illustration 2. During 1968.69 to 1970-71 the number ‘of students in University 
*X'are as follows. Represent the data by a similar diagram. 


Year Arts Science Law Total 
1968-69 20,000 10,000 5,000 35,000 
1969-70 26,000 9:000 7300 42,000 
1970-71 31,000 9,500 7,500 48,000 


Solution. The above data can best be represented by a sub-divided bar diagram, 


NUMBER OF STUDENTS IN ARTS, SCIENCE 
AND LAW IN UNIVERSITY 'X’ 


Illustration 3.  Constract component bar diagram from the following data z 


Year Public Cos. Private Cos. Total 
1962 5,000 20,000 25,000 
1963 4,009 16,000 20,000 
1964 6,000 18,000 24,000 
1965 7,000 21,000 28,000 
1966 5,000 15,000 20,000 


(В. Com., Nagpur, 1972) 


*The other alternatives for this purpose are the relative pie diagram and the 
relative component line chart. The latter can be used only in cases where the classi- 
fication is chronological. When the number of time periods is not large the relative bar 
chart is undoubtedly the superior of these diagrammatic methods. , 
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Solution : 


[ COMPONENT BAR SHOWING NUMBER OF PRIVATE 
AND. PUBLIC COMPANIES 


(с) Multiple Bars 

In a multiple bar diagram two or more sets of inter-related data are 
«epresented. The technique of drawing such a diagram is the same as 
that of simple bar diagram. The only difference is that since more than 
one phenomenon is represented, different shades, colours, dots or crossings 
are used to distinguish between the bars. Wherever a comparison between 
two or more related variables is to be made, multiple bar diagram should 
be preferred. 


Illustration 4. Represent the following data diagrammatically, 
"PRODUCTION OF WHEAT, RICE AND SUGAR FOR 1962-1966 IN INDIA 
(n million tonnes) 


1962 1963 1964 1965 1966 
Wheat А 10 12:3 13-0 120 14 
Rice 8 100 10:5 110 13 
Sugar rin TS 90 85 8 


Solution : 


PRODUCTION OF WHEAT, RICE & 
SUGAR IN INDIA DI'PING 1962-66 
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(d) Percentage Bars 
Percentage bars are particularly useful in statistical work which 
requires the portrayal of relative changes in data. When such diagrams 
are prepared, the length of the bars is kept equal to 100 апа segments are 
cut in these bars to represent the components (percentages) of an aggregate. 
Illustration 5, Represent the following data by sub-divided bars drawn on 
percentage basis. 
The cost, sale proceeds and profit or loss per chair during 1972, 1973, and 1974 
are as follows : 
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1972 1973 1974 
(a) Wages 40 50 $3 
(b) Polishing 20 2:0 25 
(c) Other costs 40 50 8'0 
Total cost 100 120 160 
Sale proceeds per chair 12:0 120 150 
Profit (4-) or Loss ( - ) per chair (+)20 Nil (-)10 


Solution. “Take the sale proceeds per chair in each of the three years as 100 and 
express the other figures as its percentages. The percentages as cbtained are given in 
the following table : 


Particulars 1972 % 1973 % 1974 % 
а pene oh 
Wages 333 417 367 
Polishing 167 16'6 16:7 
Other cost: 333 4r7 533 
Total cost 833 100°0 1067 
Proceeds 1000 100:0 100:0 
Profit or loss CF) 167 РЕ (—) 67 


These percentages would be represented diagrammatically as follow : 


COST PROCEEDS AND PROFIT OR LOSS 


100: 


75 


л 
g 
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1972 i 1973 


1974 00 
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Illustration 6. The following table shows 
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the monthly expenditure of two families. 


Represent the data by a suitable bar diagram on percentage basis : 


Items of Expenditure 


Family 
Food 43 
Clothing 8 
Recreation 3 
Education 5 
Rent 10 
Miscellaneous 6 


Expenditure in Rupees 
A Family B 


(B.A., Bombay, 1973) 


Solution: CALCULATION FOR DRAWING PERCENTAGE BAR 
раша ee ee ee 
Family A | Family B 
Items of Expenditure -—— - = -= —————— 
Rs. % of Сит: % | Rs. %of Cum% 
Total Total 
Food 43 573 57:3 83 53:6 53:6 
Clothing 8 10-7 680 17 109 645 
Recreation 3 40 720 | 10 6'5 710 
Education 5 67 787 9 5'8 768 
Rent 10 133 920 21 13:5 90:3 
Miscellaneous 6 8'0 1000 15 97 1000 


PERCENTAGE BAR 
MONTHLY EXPENDITURE 


75 


CLOTHING - 
10,9%, 


^ FAMILY 8 


Note. The bar diagram drawn for the above data fails to portray the differences 
in total expenditure of two families. This defect can be removed by a ractangular 


diagram as in Il'ustration 10. 


(е) Deviation Bars 


Deviation bars are popularly used for representing net quantities— 


excess or deficit, i.e., net profit, net loss, net exports or imports, etc. 


Such 


bars can have both positive and negative values. Positive values are shown 
above the base line and negative values below it. The following illustra- 


ion would explain this type of diagram : 


E-610 ` DIAGRAMMATIC AND GRAPHIC PRESENTATION 


Illustration 7. 
VALUE OF SEA-BORNE INDO-U.S. TRADE DURING 1970-75 


Year Exports Imports Balance of Trade 
(million rupees) (Deviation) | —— 

КБ " T3 

1970-71 35 д 15 20 

1971-72 100 85 15 

1972-73 70 90 20 

1973-74 120 130 10 

1974-75 140 180 40 

Solution. 


: VALUE OF SEA- BORNE 
3 Г TRADE OF INDIA WITH U-S-A 


1972-79 — 1079-74 
=] = 


1970-71 1971-72 1972-73 1973-74 1974-75 
Years 


Broken Bars 


In certain series there may be wide variations in values—some values 
may be very small'and others very large. Іп order to gain space for the 
smaller bars of the series, the largest bars may be broken. 


Illustration 8. Represent the following data by a suitable diagram : 


Colleges Number of students in the 
. Сот, Class 


wawy 
8 


Solution. In college Р, the number of students is the maximum (12 times that 
ofcollege B). In order to gain space we have broken the bar for college D. Otherwise 
the length of this bar would have been 12 times thatof the bar for college B and the 
diagram would have occupied a lot of space and given an ugly look, (See diagram oa 
the next page). è 


EM -—*- 
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No. OF STUDENTS ( COLLEGE- WISE) 


Ф 
© 


Мо. OF STUDENTS 
9 


LJ 
o 


COLLEGES 


П. Two-dimensional Diagrams 


As distinguished from one-dimensional diagrams in which only the 
length of the bars is taken into account, in two-dimensional diagrams the 
length as wellas the width ofthe bars is considered. Thus the area of 
the bars represents the given data. Two-dimensional diagrams are also 
known as surface diagrams or area diagrams. The important types of such 
diagrams are į 


(a) Rectangles, (b) Squares, and (c) Circles. 


(a) Rectangies 


This form is quite popular. Since the area of a rectangle is equal to 
the product of its length апа width, while constructing such a diagram 
both length and width are considered. When two sets of figures are to be 
represented by rectangles, either of the two methods may be adopted. We 
may represent the figures as they are given or may convert them to per- 
centages and then sub-divide the length into various components. The 
latter method is more popular than the former as it enables comparison to 
be made on a percentage basis. The following examples would illustrate 
both these methods of constructing rectangular diagrams : 
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Illustration 9. Present the following data by a rectangular diagram. 


Commodities 


A(Rs.) B(Rs.) 
Price per unit of commodity 10 12 
Quantity sold 20 24 
Cost of raw materials used 100 120 
Other costs 60 96 
Profit 40 70 
rA эме "ы, + Чу 


(B. Com., Bangalore, 1973) 
Solution. Let us calculate per unit the cost of raw materials, other expenses and 
profits, 


AME vr 
Commodity A Commodity B 
20 units 24 units 
Total(Rs.) ^ Per unit (Rs.) Total (Rs.) Per unit (Rs.) 
Cost of raw materials 100 5 120 5 
Other expenses 60 8 96 4 


Profit 4) 2 72 3 
ТТ ОЛЕН SS Ae eee ^ 701, А 


The widths of the rectangles would be in the ratio of 20 : 24 or 5: 6 or 23:3, 


12 


2 
ЕУ 
M 
: 
М 
& 
È 
3 
$ 
S 


20 UNITS 24UNITS 


Il'ustration 10. Represent the following data relating to the monthly expendi- 
angular 


ture of two families 4 and B by means of a rec diagrani on a percentage basis, 
Items о, Family A Family B Items of. Family A Family B 
Козы in Rs. Rs. Expenditure Rs, Д 
Ыйы. 1. е XL rg 
Clothing ouse Rent 20 50 
Education 60 100 Miscellaneous 40 40 


(B. Com., Bangalore, April, 1974) 
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Solution. The given figures are to be converted into percentages as follows : 


| Family А Еатйу В 

Items of Expenditure 

; Rs. % cum. % Rs. % сит. 1 
Food 160 40 40 140 28 28 
Clothing 80 20 60 110 22 50 
Education 60 15 75 100 20 70 
Fuel 40 10 85 60 12 82 
House Rent 20 5 90 50 10 92 
Miscellaneous 40 10 100 40 8 100 

Total | 400 100 | $00 100 


Since the expenditure in Family Ais Rs.400 and in Family В Rs._500 
width of the rectangles will be in the ratio of 4 : 5. "a = 


The area diagram is more difficult to read than to ,«onstruct because 
of the problem of judging areas. 


(b) Squares 

The rectangular method of diagrammatic presentation is difficult to 
use where the values of items vary widely. For example, if in the illus- 
tration given above the number of units sold of commodity A and B are 
20 and 240 respectively, the widths of the rectangles would be in the 
ratio of 5: 600r 1:12. Ifthis ratio is taken the diagram would look 


very unwieldy. Itis in order to overcome this difficulty that squares 


are used. 

The method of drawing a square diagram is very simple. One has 
to take the square-root of the values of various items that are to be shown 
in the diagrams and then select a suitable scale to draw the squares. 
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The following example would illustrate the procedure of drawing 
such a diagram : 


T Illustration 11. Represent the Hfollowing data [through (i) squares, (ii) circles 
elow : 


Countries Production of Cane-sugar 
1974-75 (in tonnes) 

India 2,75,00,000 

Jawa 1.55,00,000 

Hawaii 83,50,000 

СоћтЫа 5,10,000 


(Figures Imaginary) 
Solution. CALCULATIONS FOR DRAWING SQUARE DIAGRAM 


Countries Production of Square-roots Side of the Square 
Cane-sugar in inches 
( 10,000 Omitted ) 


India 2,750 52:44 0'87 
Jawa 1,550 39:37 0°66 
Hawaii 835 ; 28:90 0:48 
Columbia 51 714 0:12 


Note. Each figure of the square-root has been divided by 60 and the side of 
the square obtained, 


PRODUCTION OF CANE SUGAR 
(1974-75) 


o 


HAWAII COLUMBIA 


(c)ICircles 


Another way of preparing a two-dimensional diagram is in the form 
of circles. In such diagrams both the total and the component parts or 
sectors- can be shown. The area of a circle is Proportional to the square 
of its radius, As in the construction of squares, the square-roots of various 
figures are worked out while constructing the circles. However, in the 
latter case the radii of the circles (rather than the side of squares) are 
Proportional to the square-roots of the figures. 


Circles can be used in all those cases in which Squares are used, 
However, in both these types of diagrams it is difficult to judge the re- 
lative magnitudes with precision. The data of the Illustration 11 are 
represented by means of circles as follows 2 
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PRODUCTION OF SUGAR CANE 
(1974-75) 


Circles are difficult to compare and as such are not very popular in 
statistical work. When it is necessary to use circles, they should be com- 
m on an arca basis rather than on a diameter basis, as the diameter 

asis is very misleading. Compared to rectangles circles are more difficult to 
construct and interpret. 


Pie Diagram 


A very common use of the pie chart is to represent the division of 
asum of money into its components For example, the entire circle, 
or pie, may represent the budget of a family for a month, and the 
sections may represent portions of the budget allotted to rent, food, 
clothing, and so on. Similarly, through a pie diagram we can show how 
a rupee spent by a firm is distributed over various heads such as wages, 
raw materials, administrative expenses, etc. 


The pie chart is so called because the entire graph looks like a pie, 
and the components resemble slices cnt from pie. 


Tn constructing a pie chart the first step is to prepare the data so 
that the various component values can be transposed into corresponding 
degrees on the circle. Suppose there are four components in a series 
representing the following values: (i) 60 per cent, (ii) 25 per cent, 
(it) 10 per cent, and (iv) 5 per cent. Since 1 per cent is equal to 3'6 
degrees (360/100=3'6), the corresponding values of the four components in. 
the illustraticn are (60°0) x (3°6)==216°; (25:0) (376)—90^; (10:0) x (3 6). 
=36°; (5'0) х (3'6)=18°. 
The second step is to draw а circle of appropriate size with а com- 
-. The size of the radius depends upon the available space and other 
eio of presentation. 


The third step is to measure points on the circle representing the х 
size of cach sector with the help of a protractor, The ordinary protractor 


‘is based ироп а ‘scale in which the total circle is 360 degreés, but it is 


possible to purchase a protractor in which the entire circle is divided not 
into 360 but 100 equal parts so that the angle representing any desired 
per cent can be read directly. 

In Jaying out the sectors for a pie chart it is desirable to follow some 
logical arrangement, pattern or sequence. For example, it is a common 
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cedure to arrange the sectors according to size, with the largest at the 
p and others in sequence running clockwise. An essential feature of 
he pie chart is the carefu! identification of each sector with some kind of 
planatory or descriptive label. If there is sufficient room, the labels can 
be placed inside the sectors; otherwise the labels should be placed in 
contiguous positions outside the circle, usually with an arrow pointing to 
the appropriate sector, 
Tilustration 12, The following figures relate to the cost of construction of a 
house in Delhi : 


Items Expenditure Items Expenditure 
Cement 20% Timber 15% 
Steel 18% Labour 25% 
Bricks 107; Miscellaneous 1255 
Represent the data by a suitable diagram. (B.A. Hons. Econ., Kurukshetra, 1975) 
Solution : 

Cement 20x3:6—720 

Stee} 18x3'6=64'8 

Bricks 10x3:6—36'0 

Timber 15x3:6—540 

Labour 25x3 6=90°0 

Miscellaneous 12x3:6—432 

Total 360 


Now a cirele shall be drawn suited to the size of]the paper and divided into 
6 parts according to degrees of angles at the ceatre. (The angles have been arranged 
in descending order.) } 


PIE DIAGRAM (SHOWING THE GOST OF 
CONSTRUCTION OF A HOUSE IN DELHI) 


Illustration 13. The following data relate to the expenditure of three families 
per month : 


Items of Expenditure Family A Family B Family C 
Food 40 60 160 
Rent 20 40 150 
Clothing 20 30 100 
Education 10 40 80 
Litigation 5 10 30 
Miscellaneous 5 20 80 


Represent this data by an angular (pie) diagram, 
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Solution. CALCULATIONS FOR CONSTRUCTING THE DIAGRAM 


Family-A Family В Family C 
Items of йо 
Expenditure | 
Rs. Degrees Rs. | Degrees Rs. Degrees 
| 
ad | 
| 
40 60 160 
Food 40 [тоо «360—144 ‚ 60 рия 160 609 360=96 
0 
Rent 20 x30- 72 | 4o 72 | 150 90 
Clothing | 20 | 20 узо 72 | 30 | 54 | 100 0 
1 Í 
Education 10 sy х360= 36 40 | 72 80 48 
Litigation 5 175 х360- 18 | 10 | 18 | 30 18 
Miscellaneous | 5 [yay x36- 18 | 20 36 | 80 48 


| 


y 


Pie diagrams are less effective than bar diagrams for accurate reading 
and interpretation, particularly when series are divided into a large 
number of components or the difference among the components is very 
small. It is generally inadvisable to attempt to portray a series of more 
than five or six categories by means of a pie chart. If, for example, there 
are eight, ten or more categories it may be. very confusing to differentiate 
the relative values portrayed specially when the several small sectors are 
of approximately the same size. This type of diagram, although frequently 
used, appears upon comparison inferior to simple bar diagram, the divided 
bar diagram or a group of curves. 


| 
| 


| 


| 
| 


Limitations of Pie Diagrams 
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ШІ. Three-dimensional Diagrams 


Three-dimensiona] diagrams, also known as vorume diagrams, consist 
of cubes, cylinders, blocks, etc. In such diagrams three things, namely, 
length, width and height, have to be taken into account. Such diagrams 
are used where the range of difference between the smallest and the largest 
value is very large. For example, if two values are in the ratio of 1 : 1,000 
and its bars are used to represent them the shortest bar would be of one- 
thousandth part of the largest bar. If squares or circles are used then the 
side of one square or the radius of one circle wou!d be proportionately too 
large or too small than the other. However, if cubes are used then their 
sides would be in the ratio of 1: 10. “This example makes it clear that 
three-dimensional diagrams have an important role to play when the gap 
between the smallest and the largest value is very large. 


Cubes 


Amongst three-dimensional diagrams cubes are most popular and 
also simplest to draw. The side of a cube is drawn in proportion to the 
cube-root of the magnitude of data. 


Tiiestration 14. Represent the following;data by a three-dimensional diagram : 
AREA UNDER SUGARCANE IN VARIOUS STATES OF INDIA (1974-1975) 


U.P. 6,07,216 acres Madhya Pradesh 1,29,785 acres 
Bihar 3,56,700 acres famil Nadu 36,895 acres 
(Figures Imaginary) 


Solution: Since there is a considerable gap between the largest and the smallest 
аш cubes would best represent this data. The sides of cubescan be determined as 
follows : 


States Area Cube-roots Side of cubes 
(acres) in inches 
U.P. s 6,07,216 84:68 0:706 
Bihar 3,56,700 70:92 0:591 
Madhya Prades! 1,29,785 50:62 0422 
Tamil Nadu . 36,895 33:29 9277 


We have arrived at the sides of cubes by dividing the cube-roots by 120. 


AREA UNDER SUGARCANE 


n 
BIHAR MADHYA PRADESH MADRAS 
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The construction of cubes is not so difficult as it appears to be. The 
following points would clarify the procedure : 


(i) Construct a square with sides of 0*7 inch. It is represented by 
ABCD ір first cube above. 
(ii) Find out the mid-point of the line DC and draw a perpendicular 
0:7 inch in length, 0°35 inch below DC and 0°35 inch above it. It is 
represented by HF. 
(iii) Join DF and from C draw a line parallel to DF and equal to it in 
length. It is represented by C@. 
(iv) Join FG and from the point G draw a line parallel to FE and 
equal to it in length. It is represented by GH. 
(v) Join Band Н. The required cube is ABHGFD. 


In the similar manner other cubes can be constructed, 
Limitations of Three-dimensional Diagrams 


required. In many cases it is found upon examination that published 
diagrams of this type are not in fact proportionate to the magnitudes they 
pretend to represent. 


IV. Pictograms and Cartograms 
(st) Pictograms 


tive and easy to comprehend and as such this method is particularly useful 
їп presenting statistics to the layman. When Pictograms are used data are 
represented through a pictorial symbol that is carefully selected. The 
picture symbol should be self-explanatory in nature, i.e., it should re resent 
clearly the phenomena. For example, if the increase in number of buses 
о road is shown over а period of time the appropriate symbol would: be 
& bus. 


Lilustration 15. Represent the following data by a pictogram : 
NUMBER OF FEMALE STUDENTS IN DELHI UNIVERSITY 


Years No. of Female Students 
1965 15,000 
1966 20,000 
1967 25,000 


SME—10-79.3 
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Solution, Let one figure represent 5,000 female students, Three figures woul 
therefore, represent 15,000 females, 4 figures 20,000 females, and so on as iow by va 
following diagram : 


UMBER OF FEMALE STUDENTS 
IN DELHI UNIVERSITY DURING 1965-1967 


(1 FIGURE = 5,000 STUDENTS) 


Merits. (1) Compared with other types of diagrams pictograms 
have a greater attraction value and, therefore, where the attention of 
masses is to be drawn such as in exhibitions, fairs, they are very popularly 
used. They stimulate inrerest in the information being represented. 

(2) Facts portrayed in pictorial form are generally remembered 
longer than facts presented in tables or in non-pictorial charts. 


Limitations. However, pictograms have some limitations. They 
are difficult to construct. Besides, it is necessary to use one symbol to 
represent a fixed number of units which may create difficulties, Thus, in 
the above example, the picture of one girl is used to represent 5,000 girls, 
Now, if the number of girls is 9,650 how should we represent them? The 
answer is that either the symbol should be proportionately smaller or the 
figure approximated to 10,000, In either case, error is introduced. 


(ii) Cartograms 


Cartograms or statistical maps are used to give quantitative informa- 
tion on a geographical basis. They are thus used to represen: -patia 
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distribution, The quantities on the map can be shown in many Ways, 
Such as, through shades or colours, by dots, by placing pictograms in each 
geographical unit and by placing the appropriate numerical figure in each 
geographical unit, 

The cartogram given below shows the distribution of rain in different 
parts of the country for the period June-Sept., 1975. 


Tris 


Statistical maps should be used only where geographic comparisons 

are of primary importance and where approximate measures will suffice. 

or more accurate representation of size, bar charts are preferable. To 

be sure, maps are sometimes combined which are drawn in the appropriate 
areas, 


CHOICE OF A SUITABLE DIAGRAM 


- Which diagram out of several ones to select in a given situation is a 
ticklish problem, The choice would primarily depend upon two factors, 
namely : (i) the nature of the data ; and (ii) the type of peopie for whom 
the diagram is meant, On the nature of the data would depend whether 
to use one-dimensional, two-dimensional or three: dimensional diagram, 
and if it is one-dimensional, whether to adopt the simple bar or sub-divided 


аг chart can be divided on the following basis : 


А (а) Simple bar charts should be used where changes in totals are 
required to be conveyed. 


(b) Component bar charts are more useful where changes in totals 
аз well as in the size of component figures (absolute ones) are required to 
be displayed. 
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(c) Percentage composition bar charts are better suited where changes 
in the relative size of component figures are to Ье exhibited. 

(d) Multiple bar charts should be used where changes in the absolute 
values of the component figures are to be emphasised and the overall total 
is of no importance. 

However, multiple and component bar charts should be used only 
when there are not more than three or four components as a large number 
of components make the bar cbarts too complex to enable worthwhile 
visual impression to be gained. When a large number of components 
have to be shown a pie chart is more suitable. 

A pie chart is particularly useful where it is desired to show the 
relative proportions of the figures that go to make up a single overall total. 
Unlike bar charts it is not restricted to three or four component figures 
although its effectiveness tends to dwindle with more than seven or eight 


components. 

Pie charts cannot be used effectively where a series of figures is 

involved, as a number of different pie charts are not easy to compare. Nor 
should changes in the overall total be shown by changing the size of the 
‘pie’. 
" Occasionally, circles are used to represent size. But it is difficult to 
compare them and they should not be used when it is possible to use bars. 
This is because it is easier to compare the lengths of lines or bars than to 
corapare areas or volumes. 

Cubes should be used in those cases where the difference between the 
smallest and largest values to be represented is very large. In other cases 
cubes should not be used because comparison is too difficult with the help 
of cubes. 

Pictograms ard cartograms are very elementary forms of visual 
presentation. However, they are more informative and more effective than 
other forms for presenting data to the general public who, by and large, 
neither possess much ability to understand nor take interest in the less 
attractive forms of presentation. The pictogram is admirably suited to the 
illustrations of exhibits or articles in newspapers and magazines or for 
dressing up annual reports. Cartograms or statistical maps are particularly 
‘effective in bringing out the geographical pattern that may lie concealed in 


the data. 
GRAPHS 

A large variety of graphs are used in practice. However, here we 
shall discuss only some important tpyes of graphs which are more popular. 
Broadly, the various graphs can be divided under the following two heads : 

1. Graphs of time series. 

2. Graphs of frequency distributions. 

Constructing charts and graphs is an art which can be acquired only 
through practice, but there are a number of simple rules the adoption of 
which leads to the effectiveness of the graphs. However, before discussing 
these rules the elementary procedure of constructing a graph is considered. 
Technique of Constructing Graphs 


For constructing graphs, we make use of graph paper. Two simple 
lines are first drawn which intersect each other at right angles. The lines 


s 
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of X or ‘abscissa’ and the vertical line the axis of Y or ‘ordinate’. The 
alternate appellations are X-axis and F. -ахіѕ respectively. The following are 
the two lines : 


QUADRANT I 
(Xy) 


| QUADRANT [Y 
(tX,-v) 


In the above figure O, is the point of origin, XOX' is the axis of X or 
the ‘abscissa’ and YOY' the axis of Y ог the ‘ordinate’. Both positive as 


In quadrant 1, both the values of X and Y are positive. In quadrant Il, 


It is conventional to take the independent variable on the horizontal 
scale and the dependent on the vertical scale. In case of time Series, time is 


variable. The choice is made in such a manner that the entire data are 
accommodated in the space available, The scale on X-axis and Y-axis 
need not be identical. The scale on Y-axis generally begins from zero 
whereas on X-axis it starts wito the lowest value of the variable or the first 
time period. Once the scale is chosen equal space would represent equal 
amounts in case of natural scale. Howeveryin case of ratio Scale, it is not 
80. No hard and fast rule can be laid down about the ratio of the scale 
On the abscissa and on the ordinate because much would depend upon 
the given data and the size of the paper. However, conventionally X-axis 
із taken 1} times as long as Y-axis, But there is no rigidity about it 


After the choice of the scale is made the last step in constructing a 
graph is to plot the given data by taking the Corresponding values of X 
and Y. The various points so obtained are then joined by straight lines. 
X.—————- GRARHS OF TIME SERIES OR LINE GRAPHS 


- When we observe the values of a variable at different points of time, 
the series so formed is known astime series. The technique of graphic 
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presentation is extremely helpful in analysing changes at different points of 
time. On the X-axis we generally take the time and on the Y-axis the 
value of the variable and join the various points by straight lines. The 
graph so formed is known as line graph. Such graphs are most widely 
used in practice. They are the simplest to understand, easiest to make and 
most adaptable to many uses. They require the least technical skill and at 
the same time enable one to present more information of a complex nature 
in a perfectly understandable form than any other kind of chart. Many 
variables can be shown on the same graph and a comparison can be made. 


Graphs of time series can be constructed either on a natural scale or 
on а ratio scale. In natural or arithmetic scale absolute changes from one 
period to another are shown whereas іп a ratio scale the rates of change or 
the relative changes are shown. First of all, we take up line graphs on a 
natural scale and then study such graphs on ratio scale. 


Rules for Constructing the Line Graphs on Natural Scale 


In constructing a graph of time series on natural scale the following 
Points should be kept in mind : 


1, Take the time on the X-axis (horizontal) and the variable on 
the Y-axis (vertical). The unit of time in which the variable under con- 
sideration is measured should be clearly stated in the title, i.e., ап indica- 
tion should be given as to whether the years are calendar or financial or 
whether the variable is measured as at a date. 


2. Begin Y-axis with zero and select a suitable scale so that the 
entire data is accommodated in the ‘space available. On the arithmetic scale 
equal magnitude must be represented by equal distances. This require- 
ment is true for both the X-axis as well as the Y-axis but for each separa- 
tely. For example, 1” on Y-axis may represent 1,000 units whereas 1” on 
X-axis may represent gap between 1970 and 1972. The scale should be 
so chosen that the horizontal axis is longer than the vertical one. If the 
fluctuations in the variable are too small or if the lowest value of the 
variable is large, the false base should be used, 


3. Corresponding to the time factor plot the values of the variable 
and join the various points by straight lines (and not with curves). The 
points on the graph should not be indicated by circles or crosses rather 
dots should be used so that they disappear into lines, 


4. Join points with straight lines, not curves. 


5. Ifon one graph more than one variable is shown, they should 
be distinguished by the use of thick, thin dotted lines, etc., or different 
colours be used. Every graph should be given a suitable title. The 
unit of time in which the variable under consideration is measured should 
be clearly stated in the title, i.e. an indication should be given as to 
whether the years are calendar or financial or whether the variable is 
mtasured as at a date, 


6. Lettering on the graph, ie, indication of years, units, etc., 
should be done horizontally and not vertically so that in order to read 
what is written it is not necessaty to turn the graph from one side to 
another, * 
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Graphs of one Variable 


When only one variable is to be represented, on the X-axis measure 
time and on the Y-axis the values of the variable and plot the various 
points and join them by Straight lines. The fluctuation of this line shows 
the variations in the variable, and the distance of the plotting from the 
base line of the graph indicates the magnitude, 


Illustration 16. Represent the following data graphically : 


Years Production of Rice Years Production of Rice 
(Million Tonnes) (Million Tonnes), 


One of the fundamental rules while constructing graphs is that the 
scale on the Y-axis should begin from zero even if the lowest Y-figures 
associated with any X Period or value is far above zero. However, if this 
is strictly followed the curve would be very much pulled up towards the 
right, ie, away from the point of origin. When the gap between zero 
and smallest value of the variable is large, for example, if the variable 
Starts from 50,000, a lot of space would be required to show the variable. It 
is in order to solve this difficulty that the use of false base is made. When 


as shown in the graph on the next page. The following are the objects of 
using false base line : 

l. To magnify the minor fluctuations on the graph so that they are 
Clearly visible to the reader. 


2. To economise in space, 

However, the reader has to be extra-cautious in interpreting a graph 
in which false base has been used. It is because the false base may be 
deliberately used to magnify the fluctuations in a variable, such as, sale, 
Profits, production, etc, 
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The following graph will illustrate how false base is used : 


FALSE BASE GRAPH 


VALUES oF 8000 


THE | 
vagiABLE 7000 
6,000 


5,000 [ 


4000 Бы NOE ORE PEE 
ARDA 
SET N 


Just as we have talked of Y-axis starting from zero, we can also talk 
of X-axis starting from zero. To represent false base on the X-axis we 
draw a kinked line as follows : 


It is clear from above that a considerable saving in space is possible 
$ in case the variable starts from a value much away from 0. It may, 
` however, be noted that there is a growing feeling that there is no sanctity 
in X-axis and Y-axis starting from zero, they can well start from the 
lowest value or near about from the data given. If that is so the false 
base line and kinked lines have become of theoretical significance only. 


Illustration 17. Represent the following data by suitable graph (Base 
1939= 100) : 


Years Index No. of Indian Years Index No. of Indian 
Industrial Profit ; Industrial Profits 

1941 187 1946 229 

1942 222 1947 192 

1943 246 1948 260 

1944 239 1949 182 

1945 234 1950 247 


(B. Com., Osmania, 1969) 
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Solution : To indicate clearly the variations in profits the use of false base would 
‘be more appropriate here. 2 


GRAPH REPRESENTING INDEX NOS.0* INDIAN 
INDUSTRIAL PROFITS FROM 1941- [9. 
260, 


| 
| 
| 
| 


Illustration 18, The following figures give the estimated future population of 
U.K. at 5 yearly intervals : 


Years Population Years Population 
1969 54,324,000 1989 58,300,000 
1974 55,385,000 1994 59,188,000 
1979 56,393,000 1999 60,115,000 
1984 57,365,000 


Draw a line-graph of this series. What factors do you think have been taken 
dn making this estimate ? Is it possible, by continuing the graph, to make valid 
«estimates for:subsequent years ? (B. Com., Delhi, 1970) 


Solution : 


ESTIMATED POPULATION OF U.K 


The ‘following factors are generally considered while projecting population 
figures : 

(a) !Birth rates, 

(6) Death rates, and Ў 

(c) Rates of immigration and emigration. 

Whenever population fora particular pericd is to be projected а base year is 
-chosen and the-above adjustments are made. 
Valid estimates for subsequent years can be made by continuing the graph only 


on the assumption that the above rates remain more or less the same. 1f there is a 


"major change in any of these rates, valid estimates for future cannot be made by project- 
4ng the graph. 
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Graph of Two or More Variables 


because different lines may cut each other and make it difficult to under- 
stand the behaviour of the variables. Therefore for the sake of clarity we 
should not represent more than 5 or 6 variables on the same graph. When. 
two or more variables are shown on the same graph it is desirable to use 
thick, thin, broken, dotted lines, etc., to distinguish between the various, 
variables, 


Illustration 19, Represent the following data Braphically ; 


Months Exports Imports Months Exports Imports 
(Rs. millions) (Rs. millions) 
April 1964 217 213 Oct. 1964 220 225 
May ,, 220 219 Nov. ,, 218 227 
June ,, 215 222 Dec 15 210 205 
July ,, 210 215 Jan, 1965 215 209 
Aug. ^,, 225 218 Feb; >т 212 222 
Sept. ,, 227 225 Mar. ,, 215 218 


(2. Com., Andhra, 1969}; 


Solution : 


GRAPH SHOWING EXPORTS AND IMPORTS 
FROM APRIL „1964 TO MARCH 1965 


0 
APL MAY JUNE JULY AUG SEP OCT. NOV DEC. JAN. FEB. MAR. 
TED Ife ee —1965— 
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Illustration 20. Present the following data through a line graph : 


VEGETABLE OIL PRODUCTION (VANASPATI) (7000 TONNES) 


Description Jan, Feb. Mar. Apl. May June July Aug. Sept. Oct. Nov. Dec. 
0) о @ (90 © (0 (D (8) (9) (10) UD (12) (13) 
Production 


8 39:8 
1966 342 292 209 301 3L7 276 275 240 281 249 323 386. 
7 sh *34:3 


26 3*4 330 382 31:3 355 362 
274 255 254 274 285 320 358 


Range Chart 

It is a very good method of showing the range of variation, i.e., the 
minimum and maximum values of a variable. For example, if we are 
interested in showing the minimum and maximum price. of а commodity 
for different periods of time or the minimum and maximum temperature, 
orthe minimum and ma»imum prices of shares of some company for 
different periods, the range chart would be very appropriate. 

Illustration 21, Present the following data by a suitable graph : 


MINIMUM AND MAXIMUM PRICES OF GOLD FOR 5 GMS. 
FOR THE YEAR 1972 


Months Highest Lowcst Months Highest Lowest- 
price price price price 
(Rs.) (Rs:) (Rs.) (Rs.) 
аппа! 160-0 1570 July 1750 1632 
a 1622 156:0 August 1758 1600 
March 165-0 160°3 September 1722 1650 
April 166:5 1624 October 1780 1680 
May 1682 160:5 November 1710 165°0 


June 170:0 161-9 December DS 1670 


378 360 854 326 328 32:0 31:6 334 338 401 - 
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Solution: 


The above data can be best represented through a range chart. The 
following are the steps in constructing such a chart. 


1. Take time on the X. -axis and the variable on the Y-axis, 


2. Draw two curves by plotting the "given data—one curve 
representing the highest values aud the other one the lowest values. [n the 
Biven case, curve A represents lowest prices, whereas curve B highest 
Prices. The gap between curve 4 and B represents the range of variation. 


3. For emphasising the difference between the lowest and highest 
values the use of colour or some shade, etc., should be made. 


“RANGE CHART OF GOLD PRICES FOR 1972 


| 


|| 


A 


A 


0 J :2] 
JAN. FEB. MAR. APR, MAY ШЕ ШҮ AUG. SEP ОСТ NOV. DEC. 
MONTHS 


: Band Graphs 


A band graph is a type of line graph which shows the total for 
successive time periods broken up into sub-totals for each of the component 
parts of the total. Іп other words, the band graph shows how an 


Band graph can also be used where the datà are put to percentage 
form; the whole chart will depict 100% and the bands, the percentage 
each component bears to the whole. 
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Illustration 22. Represent the following data about agricultural production і 


India by a suitable graph : 
PRODUCTION IN MILLION TONNES 
Year Rice Wheat Pulses Other Cereals Tota 
1962 30 10 10 14 64 
d 1963 32 11 18 69 
1964 33 85 It's 20 73 
| 1965 35 12 il 20 78 
| 1 36 10 10 22 78 
| 1967 38 11 9 23 81 


| The above data can be most suitably presented through a band 
graph. The procedure of constructing such a graph is as follows : 


PRODUCTION OF PRINCIPAL CROPS IN INDIA 
90 


i 80 
PRODUCTION 


= 
NS 


1962 1963 1964 1965 1966 1967 
YEARS 


1. Take the years оп the X-axis and the variables on the Y-axis, 


2. Plot the various points for different years for rice and join them 
by straight lines. This is represented by line A. 

3. Add the figures of rice for various years to the figures of wheat 
and plot the points and join them by straight lines. This is represented 
by line B. The difference between the two lines, i.e., В and A, gives us 
the production of wheat. 

4. Add the figures of rice and wheat to pulses and plot the various 
points. This is represented by curve C. The difference between curve С 
and curve B represents production of the pulses. 

5. Add the figures of rice, wheat and pulses to other cereals and 
drawa curve. This is represented by D. The difference between D and 

gives the production figures for other cereals. 


Semi-Logarithmic Line Graphs or Ratio Charts 
The different types of graphs discussed so far have been drawn on 
natural or arithmetic scale. Such graphs indicate the absolute changes 


in the values of a variable from one period to another. Thus if the 
profits of a firm rise from Rs, 1,000 to Rs. 2,000 and from Rs. 2,000 to 
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Rs, 3,000 in 3 years, on arithmetic scale the points would fall ina Straight 
line thus indicating absolute change from one Period to another, t.e., 

SH 11 еасһ case, However, very often we are not interested in 
absolute | amount of change in a variable rather our interest lies in 


are not so important as the rate at which Profits or sales, etc., are increasing 
or decreasing, When relative rates of change are to be Studied arithmetic 
scale is of little use. In such cases we make use of the logarithmic or the 
ratio scale, [n ratio scale €qual vertica] distances indicate equal relative 


cale and thus absolute mov ments are studied, In ratio scale, 
Owever, the difference between scales measures equal Proportional move. 
ments. This would be clear from the following example : 


50 


‚„ It is clear from the above that the natural scale is based on the 
arithmetic Progression whereas the ratio scale is based on geometric 
Progression, 


: 2. Natural Scale indicates absolute changes whereas ratio scale 
indicates rate of с i 


ges if shown оп the graph are misleading. But 


the use of ratio s vents one from drawing wrong conclusions, The 
Significance of ratio «cale would be clear from the following table 
a UCTION OF WHEAT 1960.65 
Years Production of wheat Absolute Percentage 
| tonnes) increase increase 
EXE Wane oes PM Ee UN 
1970 10 - KS 
1971 20 10 1000 
1972 30 10 500 
1973 40 10 333 
1974 50 10 250 
1975 60 10 200 
197 70 id 16-0 
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Thus, though the absolute increase is constant the relative increase 
or the rate of increase is declining. It may happen with an expanding 
business that the turnover and profits are rising each yë" but the rate of 
expansion may be declining. 1nsuch a case, if one compares absolute 
changes the conclusions would be misleading. This would be clear 
from the following example, Assume two ccmpanics whore net profits 
over a period of four years rise in both cases by Rs. 40,000. The follow- 
ing table compares their performance : 


^ | 
N Years ! 
1974 1975 1976 
S ] 
Cos. N | 
с А, N ———— jio i 
Froht in R$. 7000 A 10 20 40 50 
B 70 80 100 | по 
Absoluie increase A = 10 20 | 10 
in Rs. '000 B = 10 20 | 10 
Percentage of relative increase ^ | — 100 100 — 25 
each year over the previous year. n | — 14 25 alte 00, 
і 


If we compare the absolute profits, we would say that both the com- 
panies are equally good for purposes of investment for the simple reason 
that both of them experience the same absolute increase in profits, But 
а cluse examination of data would reveal that company А is much better 
for purposes of investment because therate of increase in profits is much 
higher in this company as compared to company В. Thus, where the 
primary consideration is the rate of change in a series rather than the 
absolute figures such as population, production, sales, profits, national 
income, per capita income, ctc., a logarithinic scale graph is more useful. 


3. The ratio scale enables us to compare the rate of change of 
categories of different statistical units on the same chart. Many curves 
can be plotted on the same graph and their trends studied. For example, 
the trends of population, production of agricultural commodities, prices, 
national income, employment, etc., can all be studied on a graph. 


4. In case of-ratio scale, the Y-axis starts from one and not from 
zero whereas in the case of natural scale Y-axis starts from zero. The reason 
for the scale on semi-logarithmic paper starting at 1 and not zero is that 
the logarithm of 1 is 0, hence value of 1 is placed at zero distance from the 
origin, i.e., at the origin. There is no logarithm for zero, nor for negative 
nuinbers, hence such values cannot be plotted. 


5. On the logarithmic scale chart constant rate of increase or 
‘decrease can be easily noticed and this property is of great use in extras 
polation, particularly in business life where forecasts have to bs made on 
the assumption that the rise or fall wijl be at the same rate. 


6. Incase of variables having wide range of valucs the ratio scale 
graph is far more suitable than the other, 


7. Incase of ratio scale the meaning of the data is derived from 
the direction of lines whereas in case of natural scale the meaning is derived 
from position of lines. 


E-6:34 DIAGRAMMATIC AND GRAPHIC PRESENTATION: 


Method of Constructing a Semi-Logarithmic Graph. А semi- 
logarithmic graph can be constructed in any of the following ways : 


l. By plotting the logarithms of the given values on a natural scale. 
2. By plotting the given values on a semi-logarithmic paper. 


When the first method is adopted the logarithms of the various: 
values of variable are obtained by consulting the logarithmic tables, 
These logarithms are then plotted on the Y-axis of the natural scale- 
and the various points are joined by straight lines to give us the required’ 
curve. 


When the second method is adopted, we do not calculate the loga- 
rithms of the values of the variable. Rather the actual values are plotted 
on the semi-logarithmic paper. This method is simple and convenient as: 
compared to the first one because here one has not to calculate the- 
logarithms of the values and hence there is considerable saving in time, 


When a graph is prepared by following any of the above two. 
methods, it is known as the semi-logarithmic graph for the simple reason 
that vertical scale is ruled on the ratio Principle but the horizontal scale 
remains on the arithmetic principle. It is also Possible to have a line 
graph that has both the scales logarithmic, Such a graph is sometimes 
called a double logarithmic graph. However, it is better to call it simply: 
а logarithmic graph to distinguish clearly from а semi-logarithmic graph.. 
"The use of such a graph is very limited. 


Tilustration 23. The following table shows the total sale of Gold Bonds by the 
Reserve Bank of India : 


Months Rs. (000) Months Rs. ('000) 

Oct, 1965 15,560 April 1966 3,250 

Nov. 1965 13,170 May 1966 3,570 

Dec, 1965 18,740 June 1966 3,620 

Jan. 1966 12,450 July 1966 3,140 

Feb, 1966 8,320 Aug. 1966 2,580 

March 1966 7,540 Sept. 1966 2,540 

Represent the data graphically on the logarithmic scale оп a piece of graph 
paper or on plain paper, (B. Com., Andhra, 1972) 


Solution. Taking the logarithms of the given values and Plotting them on e 
natural scale : 


Months Rs. (7000) Log. Months Rz. (000) Log. 
Oct. 1965 15,560 41920 April 1966 3,250 3:5119 
Nov, 1965 13,170 41200 Мау 1966 3;570 3°5527 
Dec. 1965 18,740 42727 June 1966 3,620 3:5587 
Jan, 1966 12,450 40952 July 1966 3,140 3:4969 
Feb. 1966 8,320 3:9201 Aug. 1966 2,580 34116 
Mar. 1966 7,540 3:8774 Sept. 1966 2,540 3:4048 


ан р ааа ЕЕ Б 
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GOLD BONDS’ OCT. 1965-SEPT 1966- 
LOGARITHMIC SCALE 


WAR АРС MAY AME JAY ай SP 
A mm 


-MONTHS 


Illustration 25. The following are the figures of sales of two firms А and B for 
the years 1960-1967. Present the data graphically : 


Years Sales Sales Years Sales Sales 
rm. Firm B Firm A Firm B 
(thoustnd (thousand (thousand (thousand 
units) units) units) units) 
1960 200 2,000 1964 600 6,000 
1961 300 3,000 1965 700 7,000 
1962 400 4,000 1966 800 8,000 
1963 500 5,000 1967 900 9,000 
Solution : 


Let us plot the above data both on natural scale as well as on ratio scale (ie. 
taking Logs of various values) and compare the two graphs, The following is the graph of 
data on natyral scale (graph A). 


GROWTH OF SALES OF FIRMS A MB 


SME—10°77-9 


> 
1 
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For presenting this data on ratio scale, take the Logarithms of various values. 
oL Mes к< UR M e ode cena ET OHNE oos KR 


Years Sales Logs. Sales Logs. 
Firm A Firm B 
(thousand units) (thousand units) 
————————————— —— 
1960 200 2:3010 2,000 3:3010 
1961 300 24771 3,000 34771 
1962 400 2°6021 4,000 46021 
1963 500 2:6990 5,000 '3:6990 
1964 600 277782 6,000 377782 
1965 700 2°8451 7,000 38451 
1966 800 2:903; 8,000 3:9031 
1967 900 2:9542 9,000 3:9542 


A comparison of both the above graphs reveals that when the data 
are plotted on a natural scale, it slows a much higher rate of progres in 
case of Firm В as compared to Firm 4. But when the data are plotted 
on a ratio scale it indicates that the rate of growth is the same in both 
the firms. In fact, the sales of firm А are rising by 100 and that of firm 
B by one thousand every year and so опе may form an impression that 
firm B is 10 times more progressive than firm А. However, this con- 
clusion would be drawn only if one compares absolute changes. If a 
comparison of relative changes is made, it would be clear that the rate of 
growth is the same in case of both firms A and B as is clearly shown by 
graph B. 


Interpretation of Logarithmic Curves. The logarithmic curves 
must be interpreted with caution otherwise there is a possibility of jumping 
to wrong conclusions. The following are some of the important points 
that should be kept in mind while interpreting such curves : 
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1. Ifa curve is rising upwards, it would indicate an increasing rate 
of change. 

2. Ifa curve is falling downwards, it represents a decreasing rate 
of change. 


3. Ifacurve is a straight line, the rate of change is constant or 
uniform. 

4. If a curve is rising but is nearly straight, it represents the growth 
at a nearly uniform rate. 


5. Ifa curve is falling but is nearly straight, it represents a decline 
at a nearly uniform rate. 

6. Ifa curve is steeper in one portion than in another portion, the 
rate of change in the former is more rapid than that in the latter. 

7. If two curves on the same ratio chart are found running parallel, 
they represent equal percentage of change. 

8. If one curve is steeper than another on the same ratio chart, the 
rate of change in the former is more rapid than that in the latter. 


Uses of Ratio Charts. The ratio charts are useful for four types 
of comparison : 


(1) A constant per cent rate of growth is represented by a straight 
line such as the sales increasing 10 per cent a year appear on the ratio chart 
as straight line. If the series curves away from the straight line, it 
denotes a corresponding change in the rate of growth or the rate of decline 
as shown in the following chart : 


MEANING OF CURVE SHAPES ON RATIO CHART 
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By observing a company's production curve on a ratio chart the 
analyst can determine whether or not it is maintaining its past rate of 


(3) Percentages or ratios may be read directly from the vertical scale 
and applied toward further graphic analysis, 


(4) Ratio scale is extremely useful in comparing series which differ 
widely in magnitude. 

Limitations of Ratio Charts. Despite the great significance of 
the semi-logarithmic graphs they are seldom used in business, industry 
and related activities. The following are some of the limitations of such 
charts : 

1. They are difficult for the layman to understand and so should 
not be used for illustrations which an arithmetic chart could show as well. 


5. Ratio scale cannot measure absolute changes 


Because of these limitations, although extremely useful under appro- 
priate conditions, this type is recommended only when the data cannot be 
shown satisfactorily by some other means, such as conventional Percentage 
measurement. 


GRAPHS OF FREQUENCY DISTRIBUTIONS 
A frequency distribution can be presented graphically in any fof the ' 
following ways : D 
l. Histogram. 
2. Frequency polygon. ^ 
3. Smoothed frequency curve. 
4. 'Ogives' or cumulative frequency curves, 


l. Histogram 


then represented by a distance on the scale that js Proportional to its 
axis shal} remain 
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the same in case the class-intervals аге uniform throughout; if they are 
different the widrh of the rectangles shall also vary. Тһе Y-axis represents 
the frequencies of each class which constitute the height of its rectangle. 
In this manner, we get a series of rectangles each having a class-interval 
distance as its width and the frequency distance as its height. The area 
of the histogram represents the total frequency as distributed throughout 
the classes. 


The histogram should be clearly distinguished from a bar diagram. 
The distinction lies in the fact that whereas a bar diagram is one- 
dimensional, i.e., only the length of the bar is material and not the width, 
a histogram is two-dimensional, that is, in a histogram both the length 
as well as the width are important. 


The histogram is most widely used for graphical presentation of a 
frequency distribution, However, we cannot construct a histogram for 
distribution with open-end classes. Moreover, a histogram can be quite 
misleading if the distribution has unequal class-intervals and suitable 
adjustments in frequencies are not made. 


The technique of constructing histogram is given bslow (i) for 
distributions having equal class-intervals, and (ii) for distributions having 
uncqual class-intervals, 


When class-intervals are equal take frequency on the Y-axis, the 


variable on the X-axis and construct adjacent rectangles. In such a case 
the height of the rectangles will be proportional to the frequencies. 


Illustration 26. Draw а histogram from the following data and measure the 
modal value.* 


Size class Frequency Size class Frequency 
0—10 5 50—60 10 
10—20 п 60—70 8 
20—30 19 70—80 6 
30—40 21 80—90 3 
40—50 16 90—100 1 


(B. Com., Delhi, 1969) 


HISTOGRAM 


10 20 30 40 50 60 70 80 90 100 
SIZE CLASS 


*For moda! valve please refer to the chapter on Measures of Central Value. 
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When class-intervals are unequal the frequencies must be adjusted 
before constructing the histogram. For making the adjustment we take 
that class which has the lowest class-interval and adjust the frequencies of 
other classes in the following manner. If one class interval is twice as 
wide as the one having lowest class-interval we divide the height of its 
rectangle by two, -if it is three times more we divide the height of its 
rectangles by three, etc., i.e., the heights will be proportional to the ratio of 
the frequencies to the width of the classes. 


This would be clear from the following example : 
Illustration 27. Represent the following data by means of a histogram : 


Weekly wages No. of Weekly wages No. of 
(n Rs.) workers (in Rs.) workers 
10—15 7 30—40 12 
15—20 19 40—60 12 
20—25 27 60—80 8 
25—30 15 
(C.A., 1972) 


Solution, Since the class-intervals are unequal, frequencies must be adjusted 
otherwise the histogram would give a misleading picture. The adjustment is done as 
follows: The lowest class interval is 5. The frequency of the class 30-40 shall be 
divided by two : since the class interval is double, that of 40-60 by 4 etc. 


HISTOGRAM 


0 
0 15 20 25 30 35 40 45 50 55 60 65 70 75 80 
WEEKLY WAGES IN RUPEES 


Construction of Histogram when only Mid-points are given. 
When only mid-points are given, ascertain the vpper and lower limits of 
the various classes and then construct the histogram in the same manner. 


Illustration 28. Draw histogram of the following frequency distributions : 
Life of electric lamps 


(in hours) mid-values 1,010 1,030 1,050 1,070 1,090 

Firm 4 10 130 482 360 18 

Firm B 287 105 26 230 352 
(4.C.W.A., 1973) 


Solution. Since we are given the mid-points we should ascertain the class limits for 
constructing histogram. i 
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Life of Electric Frequency Frequency 
Lamps Firm A Firm B 
1,000—1,020 10 287 
1,020— 1,040 130 105 
1,040—1,060 482 26 
1,060— 1,080 360 230 x 
1,080—1,100 18 352 


HISTOGRAM (FIRM A) HISTOGRAM (FIRM B) 


400 
FREQUENCY 
300 


200 
100. 


3 \ 
1000. 1020 1040 1060 1080 00 л 1020 1040 1060 1080 1100 
LIFE IN HOURS LIFE IN HOURS 


It should be noted that the histogram, although adequately serving , 
its purpose, is too impersonal ever to become acceptable to or to make any | 
real impact upon the ordinary citizen. 


2. Frequency Polygon* 


А frequency polygon is a graph of frequency distribution. It has 
more than four sides. It is particulary effective in comparing two ог more 
frequency distributions. There are two ways in which a frequency polygon 
may be constructed. 

1. We may draw a histogram ofthe given data and then join by 
straight lines the mid-points of the upper horizontal side of each r.ctangle 
with the adjacent ones, The figure so formed is called frequency polygon. 
Some statisticians, however, prefer to close both the ends of the polygon by 
extending them to the base line. When this is done two hypothetical classes 
at each end would have to be included—each with a frequency of zero. 
This extension is made with the object of making the area under polygon 
equal to the area under the corresponding histogram. The students are 
advised to follow this practice. 


2. Another method of constructing frequency polygon is to take the 
mid-points of the various class-intervals and then plot the frequency corres- 
ponding to each point and to join all these points by straight lines. The 
figure obtained would exactly be the same as obtained by method No. 1. 
The only difference is that here we have not to construct a histogram. 


By constructing a frequency polygon the value of mode can be easily 


ascertained, If from the apex of the polygon a perpendicular is drawn on 


* “Polygon” literally means ‘many-angles’. 
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the X-axis, we get the value of mode. Moreover, frequency polygons 
facilitate comparison of two or more frequency distributions on the same 
graph. 

Frequency polygon has a special advantage over the histogram. The 
frequency polygons of several distributions may be plotted on the same 
axis, thereby making certain comparisons possible, whereas histograms 
cannot be usefully employed in the same way. To compare histograms we 
must have a separate graph for each. Because of this limitation for pur- 
poses of making a graphic comparison of frequency distributions, frequency 
polygons are préferred.® Also generally speaking, histograms are prefer- 
able when classes are few, frequency polygons when classes are numerous. 


Illustration 29, Draw a histogram, frequency polygon and frequency curve repre- 
senting the following data. 


Length of Leaves No. of Leaves Length of Leaves No. of Leaves 
6°5—7'5 cm. 5 10:5— 11-5 cm. 32 
TS—85 ,, 12 115—12:5 ,, 6 
$5-95 , 25 125—135 ,, 1 
95—105 ,, 48 (B. Com. Mysore, 1972) 
Solution : 


When the second method of constructing frequency polygon is used, 
the graph would take the same shape as above with a difference that there 
would be no histogram. 


In the construction of frequency polygon the same difficulties are 
faced as with histograms, i£., they cannot be used for distributions having 
open-end classes and suitable adjustment as in case of histogram is neces- 
sary when there are unequal class-interyals, 


* To make comparison of frequency distributions percent uencies are often 
Doer Accordingly, to compare frequency polygons T" i plot percentage 
requencies, 


" 
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FREQUENCY POLYGON 


9S Oo 
LENGTH OF LEAVES (СМ) 


3. Smoothed Frequency Curve 


A smoothed frequency curve can be drawn through the various points 
of the polygon, The curve is drawn freehand in such a manner that the 
area included under the curve is approximately the same as that of the 
polygon. The object of drawing a smoothed frequency curve is to eliminate 
as far as possible all accidental variations that might be present in the data. 
While smoothing a frequency polygon the fact that it is really derived 
from the histogram should always be kept іп mind. This would imply 
that the top of the curve would overtop the highest point of the polygon 
Particularly when the magnitude of class-interval is large. The curve 
should look as regular as possible and all sudden turns should be avoided. 
Тһе -ехієпі of smoothing would, however, depend upon the nature of the 
data, Ifit isa natural phenomenon like tossing of coins, smoothing may 
be freely resorted to as such phenomenon normally has symmetrical curves, 
but if the phenomenon is social or economic the curve is generally skewed 
and as such smoothing cannot be carried too far. 


For drawing a smoothed frequency curve it is necessary to first draw 
the polygon and then smooth it out. As discussed earlier, the polygon can 
be constructed even withoat first constructing a histogram by plotting the 
frequencies at the mid-points of class-intervals. This may save some time 
but the smoothing of the polygon cannot be done properly without a 
histogram. Hence, it is desirable to proceed in a sequence, i.e., first to 
draw a histogram, then a polygon and lastly to smooth it to obtain the 
smoothed frequency curve. This curve should begin and end at the base 
line and as a general rule it may be extended to the mid-points of the classe 
intervals just outside the histogram. The area under the curve should 
represent the total number of frequencies in the entire distribution. 


The following points should be kept in mind while smoothing a 
frequency curve : 

1. Only frequency distributions based on samples should be 
smoothed. 

2. Only continuous series should be smoothed. 

3. The total area under the curve should be equal to the area under 
the original histogram or polygon. 
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4. Cumulative Frequency Curves or *Ogives* 


At times we are interested in knowing *how many workers of a factory 
earn less {һап Rs. 100 per month' or *how many workers earn more than 
Rs. 300 per month’, ‘percentagé of students who have failed’, etc. To 
answer these questions it is necessary to add the frequencies. When 
frequencies are added they are called cumulative frequencies. These 
frequencies are then listed in a table called a cumulative frequency table, 
The curve obtained by plotting cumulative frequencies is called а cumula- 
tive frequency curve or an Ogive (pronounced Ojive).* 

There are two methods of constructing Ogive, namely— 

(a) The ‘less than’ method. 

(b) The ‘more than’ method. 

(a) ‘Less than’ method. In the ‘less than’ method we start with the 
upper limits of the classes and go on adding the frequencies. When these 
frequencies are plotted we get a rising curve. 

(b) ‘More than’ method. In the ‘more than’ method we start with the 
lower limits of the classes and {тога the total frequencies we subtract the 
frequency of each class. When these frequencies are plotted we get a 
declining curve. 

The following frequency distribution is converted into a cumulative 


frequency distribution first by the ‘less than’ method and then by the 
‘more than’ method : 


Marks- No. of Students Marks No. of Students 
10—20 4 d 40—50 20 
20—30 6 50—60 18 
30—40 10 60—70 2 
CUMULATIVE FREQUENCY DISTRIBUTIONS 
i Marks No. of Students Marks No. of Students 
‘Less than’ ‘More than’ 
20 4 10° 60 
30 10 20 56 
40 20 30 50 
50 40 40 40 
60 58 50 20 
70 60 60 2 
70 0 


From the above distribution one can read at once the number of 
students who have obtained marks less than a particular value or more 
than a particular value. Thus there are 20 students who have obtained 
marks less than 40 and 50 students who have obtained marks more than 30. 

Sométimes instead of writing “Less than’ and ‘More than’ we write 
‘or less’ and ‘or more’. The implication is different in the two cases. 
Thus marks less than 20 would exclude 20 whereas marks ‘20 or less’ 
would include 20. 


* It is so called because its shape resembles that of the rib of Gothicarch, 
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Similarly marks more than 30 would exclude 30 whereas marks ‘30 
or more' would include 30. One has to be very clear about the object in 
mind before these terms are used. 

Utility of Ogives. From the standpoint of graphic presentation, 
the ogive is especially used for the following purposes : 

l. To determine as well as to portray the number or proportion of 
cases above or below a given value. 


2. To compare two or more frequency distributions. Generally 
there is less overlapping when comparing several ogives on the same grid 
than when comparing several simple frequency curves in this manner. 

3. Ogives are also drawn for determining certain values graphically 
such as median, quartiles, deciles, etc. 

Despite the great significance of ogives, it should be noted that they 
are not as simple to interpret as one may feel and hence the reader must 
be careful while using them. 

Illustration 30. Draw a histogram and the two ogives for the following data on 
the size of families : ? 
No. of children 0 1.425 BS pase $1706 
No. of families iU 82 50. ДУ 12 7 2 
(B. Cam., Dharwar, 1972) 
Solution, Let us arrange this data in the form of a frequency distribution with 
class-intervals : 


No. of children No. of families No. of children Мо. of families 
0—1 171 4 5 13 
1—2 82 5—6 7 
2-3 50 6—7 2 
3-4 25 
—_— — 
HISTOGRAM 


Mimi 
150 


1| 
| 


NO.OF 
FAMILIES 12 


100 


754 


50 
25 
0 


„ШЕРДИ КЧ aa. 1А 
NUMBER OF CHILDREN 
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No. of children 


No. of families No. of children No. of families 


Less than 1 171 Less than 5 341 
WO um 253 PRU T. 348 
» » 303 * 5 350 
» » 328 


No. of children 


'0 or more 


3, 


” 
» 
„ 


No. of families 


350 4 or more 22 
179 Sao 9 
97 65, „ 2 
47 


3.4 
NUMBER OF CHILDREN 
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. Illustration 31. "Draw a percentile curve for the following distribution of marks 
obtained bv 700 students in an examination : 


Marks No. of students Marks No. of students 
— 9 9 50—59 102 
10—19 42 60—69 71 
20—29 61 70—79 23 
30—39 140 80—89 2 
40—49 250 
Find from the graph (i) the marks at the 20th percentile, and (ii) the percentile 
equivalent to a mark of 65. (B. Com., Delhi, 1971) 


Solution. A percentile curve is a cumulative curve drawn on a percentage basis. 
Hence for drawing such a curve three steps are required : 

l. Find the cumulative frequencies of the given data by ‘less than’ method. 

2. Convert these cumulative frequencies into percentages of the total. 

3. Take these percentages on the Y-axis and the variable on the X-axis and plot 
the various points and join them by straight lines. The curve so drawn is known as the 
Percentile curve 


——— S 
Marks Frequency Cumulative Percentages 
less than Frequency 
95 T3 
19:5 T3 
29:5 16:0 
39:5 360 
49:5 717 
59:5 863 
69:5 964 
79:5 99:7 
89:5 100:0 


OS IMS 193 WS 493 595 695 79-5 BPS 
MARKS 


It is clear from the graph that at 20th percentile marks are 31 5 and correspond. 


ing to 65 marks, the percentile is 90. T Т : 
Illustration 32 The following table gives the average earnings of the mill 


workers in a certain city : 


Monthly wages in Rs. Frequency Monthly wages in Rs. Frequency 

18 21 42 36 
21 29 45 45 
24 19 48 27 
27 39 51 48 
30 43 s 54 21 
33 94 57 12 
36 73 60 5 
39 68 


Draw a histogram and a frequency curve for the data given above. Find the 


i lie between Rs. 31 and Rs. 53 
number of mill workers whose wages lie between (B. Com., Madras, 1969) 
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Solution. Let us first arrange the data in the form of frequency distribution 
with class-intervals : 
Monthly wages Frequency E ly wages Frequency 
Rs) 


HISTOGRAM 


‘Monthly wages (К...) Frequency Monthly wages (Rs.) Frequency 
Less than 21 2: Less than 45 422 
» 24 50 » » 48 461 
to 27 69 soe “OE 494 
» n» 30 108 » » 54 542 
„ 33 151 » » 57 563 
» » 39 245 » » 60 575 
» » 39 318 » » 63 580 
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MISCELLANEOUS ILLUSTRATIONS 
Mlustration 33. Represent the following data by a suitable diagram : 


Jtems of Expenditure Family A Family B 
(Income Rs. 500) (come Rs. 300) 
Food 150 150 
Clothing 125 60 
Education 25 50 
Miscellaneous 190 70 
Savings or Deficit + 10 —30 


(М.А. Econ., Meerut, 1975) 


Solution, The given data can best be represented by a rectangular diagram : 


Family А Family B 


Items of Expenditure Rs. 500— 1009; Rs. 300— 10094 
Rs. % | Mm 
Food ^ qo] s 
Clothing 125 25 
Education 25 5 
Miscellaneous 190 38 
Savings or deficit 7102 2 


RECTANGULAR DIAGRAM REPRESENTING 
BUDGETS OF FAMILY A AND B 


FAMILY A (Rs. 500) 


Wii] 


TA 
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Illustration 34. You are given the following frequency distribution of monthly 
expenditure on food incurred by a sample of 100 families in a town : 


Expenditure 


0— 50 
50—150 
150—250 


Frequency 


7 
24 
30 


Expenditure Frequency 
250—400 27 
400— 600 8 
600—800 4 


Draw the frequency polygon and the histogram for this distribution. 


Solution. Since class intervals are unequal, the frequencies shall be adjusted. 


Expenditure Frequency 
0— 50 7 
50—150 24 
150—250 30 


i 
Ё 


15 
14 
м 
n 
п 
10 
9 
8 
7 
6 
5 
4; 
з 
2 


Adjusted 
Frequency 


T7 
12 
15 


(В.А. Hons. Econ., Delhi, 1975) 
Expenditure Frequency Adjusted 
Frequency 
250—400 27 9 
400—600 8 2 
600—800 4 1 


FREQUENCY 


e POLYGON 


EXPENDITURE 


Illustration 35. The following table gives the height of trees : 


Height 


Below 7 feet 
» M45 
н 21 » 
28 4 


No. of trees 


26 
57 
92 
134 


Height 


Below 35 
"ETT 
is 49 
» 56 


No. of trees 
feet 216 
» 287 
3s 341 
se 360 


Represent the data in the form of histogram, ‘less than’ ogive and ‘more than 


Solution. 
Height 
0— 7 
7—14 

14—21 
21—28 


No. of trees 


26 
31 

35 
42 


Height 
28—35 
35—42 
42—49 
49—56 


(B.Sc. Agr. H.P.U., 1975) 


To construct a histogram first obtain simple frequencies : 


No. of trees 
82 
71 
54 
19 
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For drawin 


ing points ; 
Height less than 7 feet 


» » 
” ” 
» » 
» „ 
» » 


SME—10°77-10 


HISTOGRAM OF THE HEIGHT OF TREES 


h 


7 


4 21 28 35 42 49 56 
HEIGHT (FEET) 


Мо. of trees 


26 
57 


Height more than 0 feet 
» » ow Ty 
” » on 14 
” » » 21 , 
» » s B yy 
» „ 35 ay 
» n » 42, 


э}: o РОИ las 


, LESS THAN’ AND ‘MORE THAN’ OGIVES 


350 7N, 


300 - 


280- 


150 


290 | 


NO: OF TREES 
S 
© 


© 


Sd 


‘MORE THAN" 
OGIVE 


y 


‘LESS THAN’ 
“OGIVE 


a et LITE l——— 


0 


7 


HM 21 
HEIGHT 


28 35 42 49.56 
(FEET) 


Е-6:51` 


& ogive Бу ‘less than’ and ‘more than’ method we will plot the follow- 


No. of trees 
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Limitations of Diagrams and Graphs 

Although diagrams and graphs are а powerful and effective media 
for presenting statistical data, they are not under all circumstances and 
for all purposes complete substitutes for tabular and other forms of pre- 
sentation. [ he well trained specialist in this field is one who recognizes 
not only the advantages but also the limitations of these techniques. He 
knows when to use and when not to use these methods and from his 
repertoire is able to select the most appropriate form for every purpose. 
Julin has beautifully said, “Graphic statistics has a role to play of its own; 
it is not the servant of numerical statistics, but it cannot pretend, on the 
other hand, to precede or displace the latter.” 

The main limitations of diagrams and graphs аге: 

1. They can present only approximate values. 

2. They can appropriately represent only limited amount of infor- 
mation. 

- 3. They are intended mostly to explain quantitative facts to the 
general public. From the point of view of the Statistician, they are not 
of much help in analysing data. 

4. They can be easily misinterpreted and, therefore, can be used 
for grinding one's axe during advertisement, propaganda and electiooncer- 
ing. As such diagrams should never be accepted without a close inspection 
of the bona files because things are very often xot what they appear to be. 

5. The two-dimensional diagrams and the three-dimensional dia- 
grams cannot be accurately appraised visually and, therefore, as far as 
possible their use should be avoided. 


SUGGESTED READINGS 


Croxton and Cowden : Applied General Statistics, Chs. 4, 5 and 6. 
Lutz : Graphic Presentation Simplified. 

Modley and Lowenstein : Pictographs and Graphs. 

Neiswanger x : Elementary Statistical Methods, Ch. 6. 


Riggleman and Frisbee : Business Statistics, Chs. 5 and 6, 
Spurr and Smith : Business and Economic Statistics, Chs, 7 and 8. 


7 | Measures of Central Value 


(One of the most important objectives of statistical analysis is to get 
one single value that describes the characteristic of the entire mass of 
unwieldy data. Such a value is calléd the central value or an ‘average’. 
The word average* is very commonly used in day to-day coversation. 
For example, we often talk of average boy ina class, average height or 
life of an Indian, average income, еіс. When we say ‘he is an average 
student’ what it means is that he is neither very good nor very bad, just a 
mediocre type of student, However, in Statistics the term average has a 
different meaning. It may be defined as that value of a distribution which 
is considered as the most representative or typical value for a group. 
Such a value is of great significance because it depicts the characteristic of 
the whole group. Since an average represents the entire data, its value lies 
somewhere in between the two extremes, i.e., the largest and the smallest 
items. For this reason an average is frequently referred to as a measure of 
central tendency.) 


Objects of Averaging 
There are two main objects of the study of averages : 


(i) To get one single value that describes th- characteristic of the entire 
group. Measures of central value, by condensing the mass of data їп one 
single value, enable us to get a bird’s eye-view of the entire data. Thus 
one value can represent thousands, lakhs and even millions of values. For 
example, it is impossible to remember the individual incomes of 
millions of earning people of India and even if one could do it there is 
hardly any use. But if the average income is obtained by dividing the 
total national income by total population we get one single value that 
represents the entire population. Such a figure would throw light on the 
standard of living of an average Indian, 


(ii) To facilitate comparison. Measures of central values, by 
reducing the mass of data to one single figure, enable comparisons to be 
made. Comparison can be made either at a point of time or over a period 
of time. For example, we can compare the percentage results of the 
students of different colleges in a certain examination, say, B. Com. for 
1976, and thereby conclude which college is the best or we can compare 
the pass percentage of the same college for different time periods and 
thereby conclude as to whether the results are improving or deteriorating. 
Such comparisons are of immense help in framing suitable and timely 
policies. For example, ifthe pass percentage of students in College A in 
B. Com. was 80 in 1975 and 65 in 1976, the authorities have sufficient 
rcason for investigating the possible cause of the deterioration in results. 


* “Ап average is sometimes called a ‘measure of central tendency’ because 
individual values of the variable usually cluster around it. Averages are useful, 
however, for certain types of data in which there is little or no central tendency.” 

—Crum and Smith : Business Statisties. 
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However, while making comparisons one should also take into 
consideration the multiplicity of forces that might be affecting the data. 
For example, if per capita income is rising in absolute terms from one 
period to another, it should not lead one to think that the standard of 
living is necessarily improving because the prices might be rising faster 
than the rise in per capita income and so in real terms people might be 
worse off. Moreover, the same measure should be used for making 
comparison between two or more groups. For example, we should not 
compare the mean wage of one factory with the median wage of another 
factory for drawing any inference about wage levels. 


Requisites of a Good Average* 


Since an average is a single value representing a group of values, it 
is desirable that such a value satisfies the following properties : 


(i) It should be саву to understand. Since statistical methods are 
designed to simplify complexity, it is desirable that an average be such that 
can be readily understood ; otherwise, its use is bound to be very limited. 


(ii) It should be simple to compute. Not only an average should be 
easy to understand but also it should be simple to compute so that it cam 
be used widely. However, though ease of computation is desirable, it 
should not be sought at the expense of other advantanges, i.e., if in the 
interest of greater accuracy, use of a more difficult average is desirable one 
should prefer that. 


(iit) It should be based on all the items. The average should depend 
upon each and every item of the series so that if any of the items is 
dropped the average itself is altered, For example, the arithmetic mean. 
of 10, 20, 30, 40, 50, is 1O+20+30+40+-50_ 150 


ge $ 730. If we drop one: 
item, say, 50, the arithmetic mean would be 1042043044010095, 


(iv) It should not be unduly affected by extreme items. Although each 
and every item should influence the value of the average, none of the items. 
should influence it unduly. If one or two very small or very large items 
unduly affect the average, i.e., either increase its value or reduce its value, 
the average cannot be really typical of the entire series. In other words, 
extremes may distort the average and reduce its usefulness. 


(v) It should be rigidly defined. An average should be properly: 
defined so that it has one and only one interpretation. It should preferably: 
be defined by an algebraic formula so that if different people compute the 
average from the same figures they all get the same answer (barring: 
arithmetic mistakes). The average should not depend’ upon the personal 
prejudice and bias of the investigator otherwise the fesults can be 
manipulated. 


(vi) It should be capable of further algebraic treatment. We should 
prefer to have an average that could be used for further statistical computa- 
‚ tions so that its utility is enhanced. For example, if we are given 


© * Yule and Kendall: An Introduction to the The Statistios, Section 
5—Desiderata of a Satisfactory Average, ory of' Statistios, Secti 
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the data about the average income and number of employees of two 
or more factories we should be able to compute the combined average. 


(vii) It should have sampling stability. Last, but not the least, we 
should prefer to get a value which has what the statisticians call 'sampling 
stability’. .This means that if we pick 10 different groups of college 
students, and compute the average of each group, we should expect to get; 
approximately the same value. It does not mean, however, that there can 
be no difference in the values of different samples. There may be some 
difference but those samples in which this difference (technically called 
sampling fluctuation) is less are considered better than those in which this. 
difference is more. 


Types of Averages 
The following are the important types of averages : 


A. Arithmetic mean: (i) simple, and (ii) weighted 
B. Median 

€. Mode 

D. Geometric mean 

E. Harmonic mean 


Besides these, there are less important averages like moving aver. 


progressive average, ctc. These averages have a very limited field of 
application and are, therefore, not so popular, 


A. ARITHMETIC MEAN 


The most popular and widely used measure for representing the 
entire data by one value is what most laymen call an ‘average’ and what 
the statisticians call the arithmetic mean,* Its value is obtained by adding 
together all the items and by dividing this total by the number of items. 
Arithmetic mean may either be 


(i) simple arithmetic mean, or 
(ii) weighted arithmetic mean. 


Calculation of Simple Arithmetic Mean—Individual Observations 


The process of computing mean in case of individual observation, 
(i.e., where frequencies are not given) is very simple. Add together the 
various values of the variable and divide the total by the number of items. 
Symbolically : 


ЫНЫН ДЫЛ og Es 


* It should be noted that the statisticians do not like the term ‘average’ because 
it has too loose a connotation. It has different meaning, for example, an average person, 
an average wage, an average height, сіс. И can refer to either mean, median, mode, 
geometric mean, harmonic mean or any other average. [n practice, arithmetic mean is 
80 popular that the word mean or average alone without qualification is implied to denote 
this particular type*of average. That is, when anyone speaks of ‘the mean’ or ‘the 
average’ of a series of observations, it may as a rule be assumed that the arithme! 
mean is meant unless otherwise stated. 

+ The arithmetic mean of a sample is designated by the symbol Æ, which is 
*X-bar', and the arithmetic mean of a population їз designated by the Greck 
pronounced as ''mu". 


^j 


zi $ 
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Here X — Arithmetic Mean; ZX—Sum of all the values of 
variables, X, i.e., X;, Xy X,..Xn; N=Number of observations. 


Steps. The formula involves two steps in calculating mean : 
(8) Add together all the values of the variable X and obtain the 
zx. 


(5i) Divide this total by the nuniber of observations, i.e, N. 
Tilestration 1. The following table gives the monthly income of 12 families in 


S. No. 1 2 3 4 5 6 d 8 9 10 11 12 
Monthly income 

(Rs.) 280 180 96 98 104 75 80 94 100 75 600 200 

Calculate the arithmetic mean, (M.A. Econ., Lucknow, 1973) 


Solution : CALCULATION OF ARITHMETIC MEAN 


5. N. Monthly Income (Rs.) S. N. Monthly Income (Rs.) 

1 280 7 80 
2 180 8 94 
3 96 9 100 
4 98 10 75 
5 104 11 600 
6 75 12 200 

№12 УХ=1,982 


y. УХ _1982 : 
Xs = = 16517 


Thus the average income is Rs. 16517 per month. 


Short-cut method. The arithmetic mean can be calculated by 
using what is known as an arbitrary origin. Suppose we take any figure А 
and write d as the deviation of the variable X from А as follows 3 


d—X—4; X=4+4 
Xd 
2X=NA+ 3d Ае сез 


where A is the assumed mean. 


Steps. (1) Take an assumed mean*. 


(2) Take the deviation of the items from the assumed mean and 
denote these deviations by d. 


(3) Obtain the sum of these deviations, i.e., Xd, 


(4) Apply the formula : Х=а+24. . 


*Any value whether existing in the data or not can be taken as the assumed mean 
and the final answer would be the same. However, the nearer the assumed mean is to 
the actual mean, the lesser are the calculations. à 


* 
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For the above question, calculate arithmetic mean by taking 150 as the assumed 
ean: 
Solution : CALCULATION OF ARITHMETIC MEAN 
S. N. Monthly Income (X—150) 
X(Rs.) d 
SSS SS eee 
1 280 +130 
2 180 + 30 
3 96 — 54 
4 98 > 52 
5 104 — 46 
6 ?5 — 75 
7 80 — 70 
8 94 — 56 
9 100 — 50 
10 75 — 75 
11 600 +450 
12 200 4- 50 
N-12 Zd—182 


Rasy 1504- T 1504-1517 Rs. 165:17. 


Note. The reader will find that the calculations here are more than 
what we had when we used the formula 


p IX 

X TENDS 

This is true for ungrouped data. But for grouped data considerable saving 
in time is possible by adopting the short-cut method. 

Calculation of Arithmetic Mean—Discrete Series 


In discrete series arithmetic mean may be computed by applying 
(i) Direct method, or 
(її) Short-cut method. 


Direct Method. The formula for computing mean is 


уух 
SO 

Where, f=Frequency ; X=The variable in question ; N*=Total 
number of observations, i.e., Ef. 

Steps : (i) Multiply the frequency of the each row with the variable 
and obtain the total ZfX. 


(i) Divide the total obtained by step (i) by the number of 
observations, i.e., total frequency. 


“The reader should note carefully that in discrete and continuous frequeney 
distributions thc total number of observations, i.e., N=the sum of frequency or N=Df. 
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Illustration 2. From the following data of the marks obtained by 60 students of 
а class calculate the arithmetic mean : 


Marks No. of Students Marks No. of Students 
20 8 50 10 
30 12 60 6 
40 20 70 4 


Solution. Let the marks be denoted by X and the number of students by f. 
CALCULATION OF ARITHMETIC MEAN 


Marks No. of Students 
x f fX 
Ile. MT SUI ТЫЗ. 1 Узе Б.н ee tee „АК 
20 8 160 
30 12 360 
40 20 800 
50 10 500 
60 6 360 
70 4 280 
N=60 ZfX-2,460 
_ 2fX 22,460 _ 
f= Mme cl 
Hence the average marks=41. 
Short-cut Method. According to this method. 
mm": 
X-A c N 


where, A=Assumed mean; de(X—4); N=Total number of 
observations, i.e., £f. 


Steps. (1) Take an assumed mean. 


(ii) Take the deviations of the variable X from the assumed mean 
and denote the deviations by d. d 


(iit) Multiply these deviations with the respective frequency and 
take the total 2/4. 


(iv) Divide the total obtained in the third step by the total frequency. 


Dlastration 3. Calculate arithmetic mean by the short-cut method using 
frequency distribution of illustration 2. 


Solution: © CALCULATION OF ARITHMETIC MEAN 


Marks ` No. of students (X—40) 
Xx f d fa 
20 8 —20 —160 
30 12 —10 —120 
40 20 0 0 
50 10 +10 +100 
60 6 +20 +120 
70 4 +30 +120 
`+ _————————-є——— 
N=60 5/4=60 


T-a Las, Sa 
X=4+ F 40+ 60 40 4-1—41 


n———————— 


——— 9 
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Calculation of Arithmetic Mean—Continuous Series* 


In continuous series, arithmetic mean may be computed by applying 
any of the following methods : 


(i) Direct method. 

(ii) Short-cut method. 
(iii) Step deviation method. 
Direct Method. When direct method is used 
Zfm 


where m}=mid-point of various classes; f=the frequency of each 
class ; V=the total frequency, 


Steps. (i) Obtain the mid-point of each class and denote it by m. 

(ii) Multiply these mid-points by the respective frequency of each 
class and obtain the total Zfm. 

(iii) Divide the total obtained in step (i) by the sum of the frequency, 
ie; N. 


É Illustration 4. (a) From the following data compute arithmetic mean by direct 
method : 


Marks 0—10 10—20 20—30 30—40 40—50 50—60 
No. of students 5 10 25 30 20 10 
Solution CALCULATION OF ARITHMETIC MEAN BY DIRECT METHOD 
Marks No. of students Mid-points 
f m fm 
а minea RE ee ee a ee eee 
0—10 5 5 25. 
10—20 10 15 150 
20—30 25 25 625 
30—40 30 35 1,050 
40—50 20 45 900 
50 ~ 60 10 55 550 
-————— es 
N=100 Efm=3,300 
y. _2/m_ 3300 
Aes Sig 788 


Short-cut Method. When short-cut method is used arithmetic 
mean is computed by applying the following formula : 


"ES 


*For sake of clarity and ease of understanding we have used the terms individual 
observations, discrete series and continuous series throughout the test. However, the 
reader should be familiar with the terms grouped data and ungrouped data also. 
Ungrouped data refer to the individual observations whereas grouped data refer to the , 
continuous series and the discrete series. A 

jMid-point- Lower I юй Upper Limit 
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where A=assumed mean ; d=deviations of mid-points from assumed 
mean, ї.е., (m— A) ; N=total number of observations. 


Steps. (i) Take an assumed mean. 


(ii) From the mid-point of each class deduct the assumed mean. 

(5$) Multiply the respective frequencies of each class by these 
deviations and obtain the total Zfd. 
2н 
N' 

Calculate arithmetic mean by the short-cut method from the data of 
Illustration 4 (a) 


(iv) Apply the formula : == A+ 


CALCULATION OF ARITHMETIC MEAN 


Marks No. of students Mid points (m—35) 
f m d fd 
0—10 5 5 —30 —150 
10—20 10 15 —20 —200 
20—30 25 25 —10 —250 
30—40 30 35 0 0 
40—50 20 45 +10 +200 
50—60 10 55 +20 +200 
N=10 Zfd-—200 
paap as 200 
Х=А+-ү-= 35 100 =33 


Step Deviation Method. In the step deviation method the only 
additional point is that in order to simplify calculations we take a common 
factor from the data and multiply the result by the common factor. The 
formula is 


ema, N=total 


where A=assumed mean; f=frequency ; d'= 


frequency ; C—common factor. 


The symbol d' has been introduced to distingnish it from d. d denotes 
deviations from assumed mean whereas d' denotes deviation from assumed 
mean after taking a common factor. 


"The last formula is for practical reasons the most important of the 
three. While the first ones are necessary for cases where intervals are 
unequal, the last one is generally used when class intervals are equal. Not 
only is the last formula one which is more common and one which reduces 
the amount of arithmetic involved but is also directly connected to the 
formula usually used to calculate standard deviation explained in the next 
chapter. A thorough comprehension of the technique used in this formula 
will facilitate understanding of the calculation of the standard deviation—a 
relatively more difficult measure to grasp. 

From the data cf Illustration 4(а) compute arithmetic mean by step deviation 
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Solution: ^ CALCULATION OF ARITHMETIC MEAN 


Marks Мо. of students — Mid-points (т— 25) есы 
0—10 5 5 —30 23 Zis 
10 20 10 15 —20 —2 —20 
20—30 2 25 —10 -1 —25 
30—10 30 35 0 0 0 
40—50 20 45 10 1 20 
50-60 10 55 20 2 -20 

NE d HOS Шеше ааа a Fr e ESSI Mui 

N=100 Dfd'=—20 


ares 29: 
А+ xC=35— jy х10=35—2=33 


It is clear from above that all the three methods of finding arithmetic 
mean in continuous series give us the same answer. The direct method, 
though the simplest, involves more calculations when mid-points and 
frequencies are very large in magnitude. For example, observe the 
following data : 


Income in Rs. No. of persons Income in Rs. No. of persons 
100—200 368 400 500 567 
200 — 300 472 500--600 304 
300—400 969 


In this case step deviation method would be far simpler. In fact, 
step deviation method should be adopted whercver possible because it 
minimises the calculations. 


While computing mean in continuous series the mid-points of the 
various classes are taken as representative of that particular clas. The 
reason is that when the data ace grouped, the exact frequency with which 
each value of the variable occurs іп the distribution is unknown. We only 
know the limits within which a certain number of frequencies occur. For 
example, when we say that the number of persons within the income group 
100 —200 is 50 we cannot say as to how many persons out of 50 are 
getting 101, 102, 103, etc. We, therefore, make an assumption while 
calculating arithmetic mean that the frequencies within each class are 
spread evenly over the range of the class interval, i.e., there will be as 
many items below the mid-point as above it. Unlesssuch an assumption 
is made the value of mean cannot be computed. 


This assumption is likely to lead to some error. As a result thereof 
the mean of a number of observations calculated from a frequency distribu- 
tion will generally be only an approximation to the mean calculated from 
the original data. However, the possibility of compensating errors must 
be considered. Some of the mid-points err by being too low and others 
err by being too high. In general, the mid-points of the classes below the 
class containing the arithmetic mean tend to be too low and the mid-points 
of the classes above the class containing the arithmetic mean tend to be 
too high. It is quite possible, therefore, that when the errors are 
'summed, those which are too low will offset, in part at least, those which 
are too high, so that the arithmetic mean for the entire distribution will 
be approximately the same value as is obtained from a list of values. 
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Mlustration 4(6). Two hundred people were interviewed by a public opinion 
polling agency. The frequency distribution gives the ages of people interviewed. 


Age groups Frequency Age groups Frequency 

(Years) (Years) 
————M——À——————— 

80—89 2 40—49 56 
70—79 2 30—39 40 
60—69 6 20—29 42 
50—59 20 10—19 32 

Calculate the value of mean. (B. Com., Lucknow, 1973) 

Solution : CALCULATION OF ARITHMETIC MEAN 

Age groups (m—44'5) d!I0 

(Years) f m d d' fa’ 

80—89 2 8455 40 4 8 

70—79 2 745 30 123, 6 

60- 69 6 645 20 2 12 

50—59 20 545 10 1 20 

40—49 56 445 0 0 0 

30—39 40 345 —10 =I —40 

20—29 42 245 —20 -2 —44 

10—19 32 14:5 —30 -3 —96 

N=200 Уй! = —174 
X=4+ Iu xC. 

f A=44'5, Уа’ — —174, N=200, C=10 

i SO M. Somda 9-115588. 

d Х=445 200 X107445—87—35'8 years. 


$ Note. When data are given by the inclusive method itis not necessary to 
adjust the classes for calculating arithmetic mean because the mid-points remain the 


same whether or not the adjustment is made. However, in case of median and mode 
"adjustment is necessary, К 


Correcting Incorrect Values 


It sometimes happens that due to an oversight or mistake їп copying, 
‘certain wrong items are taken while calculating mean, The problem is 
how to find out the correct mean. The process is very simple. From 
incorrect УХ deduct wrong items and add correct items and then divide 
the corrected ХХ by the number of observations. The result so obtained 
will give the value of correct mean. 


Illustration 5 (а). The average weight fora group of 25 boys was calculated to 
‘be 784]bs. It was later discovered that one value was misread as 69 lb. instead of the 


‘correct value 96 Ib. Calculate the correct average. (B. Com. Poona, 1973) 
Solution. (а) 3¥=78-4 x 25 1960 (Since Fa 2X) ух-му 
Less incorrect item 69 
1891 
Add correct item 96 


Be Correct DX=1987 


Hence correct averages 8 =79°48 lb. 


(b) The mean of 200 items was 50. Later оп it was discovered that two items 
"were misread as 92 and 8 instead of 192 and 88. Find out the correct mean. 


MEASURES OF CLNTRAL VALUE E-711 


e х-2х IX=NY 
Here X=50 and N=200 
SxX=200 x 50= 10,000 
Less incorrect items 100 
9900 
Add correct items 280 
д Correct total —10,180 
and Correct mean 10:150. 50.9. 


Calculation of Arithmetic Mean in case of Open-end Classes 


Open-end classes are those in which lower limit of the first class 
and the upper limit of the last class are not known. In such a case we 
cannot find out the arithmetic mean unless we make an assumption about 
the unknown limits. The assumption would naturally depend upon the 
class-interval following the first class and preceding the last class. For: 
example, observe the following data : 


Marks No. of students Marks No. of students 
Below 10 4 30—40 15 

10—20 6 40—50 8 

20—30 10 Above 50 7 


In the above case since the class interval is uniform, the appropriate- 
assumption would be that the lower limit of the first class is zero and the: 
upper limit of the last class is 60. The first class thus would be 0—10 
and the Jast class 50—60. Observe another case : 


Marks No. of students Marks No. of students 

Below 10 4 60 -100 1 
10—30 6 Above 100 3 
30—60 10 


In the above ca:e since the class interval is 20 in the second class, 
30 in the third class, 40 in the fourth class, i.e, it is increasing by 10, the: 
appropriate assumption would be that the lower limic of the first class is: 
zero and the upper limit of the last class 150. In other words, first-class 
is 0—10 and the last one 100—150. 


1f the class intervals are of varying width, an effort should not be 
made to determine the lower limit of the lowest class and upper limit of 
the highest class. The use of median or mode would be better in such 
а case. 


Note. Because of the difficulty of ascertaining lower limit and: 
upper limit in open-end distributions it is suggested that in such distri- 
butions arithmetic mean should no: be used. 


Checking Accuracy of Computations 


Inaccuracies in computations can always arise. However, wherever 
possible one should check the accuracy of the work as one proceeds, so: 
that at the end of any step in the process he can tell whether or not the 
errors have been made. In case of arithmetic mean computed from a. 
frequency table we can apply the “Charlier’s check?’ to prove the accuracy 
(or demonstrate the inaccuracy) of our arithmetic. х 
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When the test is applied we merely add one column to our table to 
show the value of f('-]-1). If our arithmetic is correct, the sum of the 
last column, X[ f(d'+-1)] will always be equal to the sum of the two 
preceding columns, i є, Èf and Xfd'. In other words, the test is 


ZL fid" -1) = Efa' -- Ef. 
Tf this equation is not satisfied it means that there is some mistake in 


computations. 


Illustration 6. Calculate arithmetic mean from the following data and apply 
Charlier's check for verifying calculations. 


Marks 0—10 10 - 20 20—30 30—40 40—50 
No. of students 8 12 20 6 4 
Solution : APPLYING CHARLIER'S TEST 
й m—25 1 7 
Marks f Mid-points ( =y fd f(d' +1) 
- , 

0—10 8 5 -2 —16 —8 
10—20 12 15 -l —12 0 
20—30 20 25 0 0 20 
30—40 6 35 1 6 12 
40—50 4 45 2 8 12 

N50 EZfd'—-—14 Xf(d'41)—36 


Sd +D Уу 
36= -.14--50--36 
Hence our calculations are correct. 


Ya 4 lt edd -2:8-27 
X=4+ N xC=25 sy 10-25 2:82222. 
Mathematical Properties of the Arithmetic Mean 


The following are a few important mathematical properties of the 
arithmetic mean : 

l. Thesum of the deviations of the items from the arithmetic 
mean (taking signs into account) is always zero, i.e, Z(X—X)—0.* This 


would be clear from the following example : 
X (ХХ) 
xr МЫ ЗС FOR DE Ча рт idi аиа авн os iy 
10 —20 
20 —10 
30 0 
40 +10 
50 У +20 
аа. аар ар а i me а 
XX-150 Za-X)-o 


* Algebraically the property E(X—X)—0 is derived from the fact that NX¥=3X 
Frocf, X(X—X)- (7X) Q3 - X)... (04 — X) 
—>Хү+Х,+®...Х„— Х-Х...(л times) 
` 2XX-NX-XX-XX-0. 
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B 50 

2 = е =30. When the sum of the deviations from 
1 

the actual mean, i.e., 30 is taken it comes out to be zero. It is because 

of this property that the mean is characterised as а point of balance, i.e. 

the sum of the positive deviations from it is equal to the sum of the 

negative deviations from it. 


Неге ¥= 


2. 'The sum of the squared deviations of the items from arithmetic 
mean is minimum, that is, less than the sum of the squared deviations of 
the items from any other value. The following example would clarify 
the point : 


dep KE! 
x (X-X) (x-4)3 
X-4 А 

2 =3 ^ 
3 = 1 
4 0 0 
5 1 1 
6 2 4 

Xx-20 3(X-X)=0 X(X-Xj-10 


The sum of the squared deviations is equal to 10 іп the above case. 
If the deviations are taken from any other value the sum of the squared 
deviations would be greater than 10. For example, let us calculate the 
squares of the deviations of items from a value less than the arithmetic 
mean, say, 3. 


X (X-3) (x—3* 

2 =i 1 

3 0 0 

4 1 1 

5 2 4 

6 3 9 
X(X-3):—15 


It is clear that È (x —3)* is greater. This property that the sum of 
the square; of items is least from the mean is of immense use in regression 
analysis which shall be discussed later. 

4. Since = ; wroxx. 


1 
In other words, if we replace each item in the series by the mean; then 
the sum of these substitutions will be equal to the sum of the individual 
items, For example, in the discussion of first property EX—150 and the 
arithmetic mean 30. If for each item we substitute 30, we get the same 
total, i.e., 30+30+30+30+ 30=150. 


This property is of great practical value. For example, if we know 
the average wage in a factory, say, Rs. 200 and the number of workers 
employed, say, 50 we can compute total wage bill from the relation 
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NX=2X. The total wage bill in this case would be 200 x 50, i.e, 10,000 
which is equal to ZX. 

4. If we have the arithmetic mean and number of items of two or 
more than two related groups, we can compute combined average of these 
groups by applying the following formula : 

Lae NX, 4- NX, 
2 NUN, 
Xj, —combined mean of the two groups. 
AX, arithmetic mean of first group. 
37 arithmetic mean of second group. 
AN, —number of items in the first group. 

N,-— number of items in the second group. 

The following example shall illustrate the application of the above 
formula :' 

5. If the given observations on Х be changed to observations on 
Y=a+6X, then Y=a+bX. 

Illustration 7. There are two branches of an establishment employing 100 and 
80 persons respectively. If the arithmetic means of the monthly salaries paid by two 


branches are Rs. 275 and Rs. 225 respectively, find the arithmetic mean of the salaries 
of the employees of the establishment as a whole. d 


Solation. We should compute the combined mean, The formula is : 
XQ MX NX, 
Pam EAT 
Here, N1-100, X12 275, №=80, X,=225. 
Substituting the values, we get 
Xa (100x275)--(80x225) _ 27,500--18,000 45,500 
и 100-80 aes 180 = 180 

If we have to find out the combined mean of three sub-groups the 
above formula can be extended as follows : 

X LINE NX, EN X, 
A Ny NEN, 
Merits and Limitations of Arithmetic Mean 

Merits. Arithmetic mean is most w dely used in practice because 
of the following reasons : 

l. Itisthe simplest average to understand and easiest to compute. 
Neither the arraying of data as required for calculating median nor 
grouping of data as required for calculating mode is needed while 
calculating mean. 


2. Itis affected by the value of every item in the series. 


3. It is defined by a rigid mathematical formula with the result 
everyone who computes the average gets the same answer. 


4. Being determined by a rigid formula, it lends itself to subsequent 
algebraic treatment better than the median or mode. 


=Rs, 252778. 


5. Itis relatively reliable in the sense that it does not vary too 
much when repeated samples are taken from one and the same population, 
at least not as much as some other kind of statistical descriptions. 
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6. The mean is typical in the sense that it is the centre of gravity, 
balancing the values on either side of it. 


7. Itis a calculated value, and not based on position in the series. 


Limitations. 1. Since the value of mean depends upon each and 
every item of the series, extreme items, i.e, very small and very large 
items, unduly affect the value of the average. For example, if in a tutorial 
group there are 4 students and their marks in a test are 60, 70, 10 and.80, 
the average marks would be 604704-10--80= 220 —55. One single 
item, ʻe.. 10, has reduced the average marks considerably. The smaller. 
the number of observations, the greater is likely to be the impact of 
extreme values. 


2. Ina distribution with open-end classes the value of mean cannot . 
be computed without making assumptions regarding the size of the class 
interval of the open-end classes. If such classes contain a large pro- 

rtion of the values, then mean may be subject to substantial error. 

owever, the values of the median and mode can be computed where 
there are open-end classes without making any assumptions about size of 
class interval. 

3. The arithmetic mean is not always a good measure F central 
tendency. The mean provides a “characteristic? value, in the sense of 
indicating where most of the values lie, only when the distribution of the 
variable is reasonably normal (bell shaped). In case ofa U-shaped distri- 
bution the mean is not likely to serve a useful purpose. 

Weighted Arithmetic Mean 

One of the limitations of the arithmetic mean discussed above is that 
it gives equal importance to all the items. But there are cases where the 
qelative importance of the different items is not the same. When this is 
во, we compute weighted arithmetic mean. The term ‘weight’ stands for 
the relative importance of the different items. The formula for computing 
weighted arithmetic mean is : 


where X, represents the weighted arithmetic mean; W-=Weights; 
X=The variable. 

Steps. (i) Multiply the weights by the variable X and obtain the 
total ZWX. 

(ii) Divide this total by the sum of the weights. 

An important problem that arises while using weighted mean is 
regarding selection of weights. Weights may be either actual or arbitrary, 


Zfx 


* The mean of a frequency distribution х= is in fact a mean of the X's 


(class mid-points) where each X is weighted by its importance. This is only a special 
case cf the more general notion of the weighted mean 
cv УИХ 


= у» 


SME—10°7771! 
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б.е. estimated. Needless to say, if actual weights are available nothing 
like - However, in the absence of actual weights arbitrary or imaginary 
weights may be used. The use of arbitrary weights may lead to some 
error, but this is better than no weights at all. In practice, it is found 
that if weights are intelligently assigned keeping the phenomena in view, 
the error involved will be so small that it can be easily overlooked. 


Weighted. mean is specially useful in problems relating to the 
construction of index numbers and standardized birth and death rates. 


Illustration 8. (a) A contractor employs three types of workers—male, female 
ànd children, To a male worker he pays Rs. 10 per day, toa female worker Rs. 8 per 


day andito a child worker Rs. 3 рег day. What is the av гај е рег іа 
the contractor ? — en ТЕЗИ I 


Solution. The average wage is not the simple arithmetic mean, Że., Dite 


= Rs, 7 per day, If we assume that the number of male, female and child workers is 
the same, this answer would be correct, For example, if we take 10 workers in each. 
case then the mean wage would be 


(10x 10) 4- (10x 8)-- (10x 3) 100--80-30 Rs, 7 
10-F10--10 9 


; However, the number of male, female and child workers employed is generally 
different. 1f we know how many workers of each lypeare employed by the contractop 
in question, nothing like this. However, in the absence of this we take assumed weights. 
Let us assume that the number of male, female and child workers employed is 20, 15 and 
5 respectively. The average wage would be the weighted mean calculated as follows : 


Wages per day (Rs.) No. of workers 

x Ww Wx 

10 20 200 

8 15 120 

3 5 15 

со крл АР Ст et теше ал л c. 
ZW-40 ZWX-335 
v nWx 335 


Xo SW =j =Rs. 8:37 


(b) A train runs 25 miles at a speed of 30 m.p.h., another 50 milestat a speed of 
-40 m.p.h., then due to repairs of the track travels for 6 minutes at a speed of 10 m.p.h. 
and finally covers the remaining distance of 24 miles ata speed of 24 m.p.h. What ia 
the average speed in miles per hour ? (B. Com., Delhi, 1968) 


(Б) Time taken in covering 25 miles at a speed of 30 m.p.h.=50 minutes 
Time taken in covering 50 miles at a speed of 40 т.р.ћ.=75 minutes 
Distance covered in 6 minutes at a speed of 10 т.р.ћ.=1 mile 
Time taken in covering 24 miles at a speed of 24 m p.h.—60 minutes 
Therefore, taking the time taken as weights we have the weighted mean as 

келес CMM er river ae 5 NE ca (sn 


Speed in m.p.h. Time їп minutes 
X №. 
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6,000 
4. Average speed туу =31'4 m.p.h. 


Tilustration 9. Comment оп "ће performance of ‘the students of the three 
universities given below using simple and weighted averages : 


University Bombay Calcutta Madras 
Course of % of No. of % of No. of % of No. of Y 
Study Pass — Students Pass — Students Pass nci 
(in (in in 
hundreds) hundreds) hundreds) 
М.А. 71 3 82 2 81 2 
M. Com. 83 4 76 x! 26 35 
В.А, 73 5 73 6 74 45 
B. Com. 74 2 76 7 58 2 
B.Sc. 65 3 65 3 70 7 
M.Sc. 66 3 60 7 73 2 


Solution: CALCULATION OF SIMPLE AND WEIGHTED 
ARITHMETIC MEANS 


University Bombay Calcutta Madras 

Course of 96 of No. of % of No. of % of No. of 
Study Pass Students Pass ЗАГА Разз шсш 

(їп п п 
hundreds) hundreds) hundreds) 
Ww WX X WX WX X W wx 
М.А. 71 3 213 82 2 164 81 2 162 
M. Com. 83 4 332 76 3 228 76 35 266 
B.A. 73 5 365 73 6 438 74 45 333 
B. Com. 74 2 148 76 7 532 58 2 116 
B.Sc. 65 3 195 65 3 195 70 7 490 
M.Sc. 66 3 198 60 7 420 73 2 146 


Ex IW SWX УХ EW  IWX IX IW БИХ 
=432 =20 =1,451 =432  —28 —1,977 2432 =21 1,513 


Simple and Weighted Arithmetic Means 
432 > _2WX 1,451 


BESS а 
Bombay Ф 7-6 7i Xam 3g 71255 
= IX 432 Е ОУ ED M S 
Calcutta x= ae M Xy - 16 
x. 2X.4m Lv QXWX 1,513 E 
er dn at эму ne 5 1,513 _.,. 
Madras A= N EG eke uc 72:05 


The arithmetic mean is the same for all the three universities, i e., 72 and hence 
it may be conciuded that the performance of students is alike. But this will be a wrong 
Conclusion because wnat we snould compare here is the weighted aritiimetic mvans. On 
Comparing the weighted arithmetic means we tind that fur Bombay the mean value is 
nye and hence we can say that in Bombay University the performance of students 


Crade and Standardized Death Rates* 
The crude and standardized death rates are expressed per thousand, 


*For a detailed discussion of crude and standardized rates, please refer tai 
chapter on ‘Vital Statistics’, 
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The crude death rate for a given city or district is 


N 
C.D.R.— PX 1000 


where C.D.R.—Crude death rate ; N=Number of deaths ; P=Population, 


However. as between one city and another, or one district and 
another, C.D.R. affords no real comparison as the proportions of the 
different age groups in the local population may differ considerably and 
different age groups exhibit, naturally, different mortality rates. The 
crude death rates are, therefore, standardiz d by applying specific rates 
foreach age group in each district to the age groups of a standard 
population. 


Standardized death rate can be obtained by multiplving the standard 
population of each апе group with the specific death rate of local 
population and dividing this total by the standard population. 


Symbolically 
УХ 
S.D.R.— —s- 
where S.D R.=Standardized Death Rate: W=Weiehts of the standard 
population ; X=The variable or the Specific Death Rate. 


The following examples would illustrate the procedure. 


Mlustration 10. Calculate the Crude and Standardized Death Rates of the local 
popvlation from the following date and compare them with the Crude Death Rates of 
the standard population. What inference do you draw from the comparison ? 


Age Groups тапс Population Local Population 
(Years) Population Deaths Population Deaths 
0-10 600 18 400 16 
10-20 1,600 5 1,500 6 
20—60 3,000 24 2,400 24 al 
€0—100 400. - .20 700 21 
Total 5,000 67 5,000 67 


Solution : CALCULATICN OF CRUDE AND STANDARDIZED 
DEATH RATES 


Age Groups Standard Population  _ Local Population 

(Years) Death Death 

Population Deaths rate per Population Deaths rate per WX 

Ww ,000 1,000 
x 

0.10 600 18 30 400 16 40 24, 
1-20 1,009 5 y 1.500 6 4 4,000 
20 60 3,000 24 8 2.400 24 10 30,000 
6n —100 400 20 50 70 21 30 12,000 


Total 500 6 seco 6 XWX-70,000 


0, 
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ERANT 67 ; 
C.D.R. of standard population——,-x1,000— 55x 1,000—134 
C.D.R.of Local population =" x1,000—13:4 


РАО) 
However, for purposes of comparison we have to compute standardized death 


ZW- Weight, ѓ.е., standard population in this case, 
X- The variable, i.e., the death rate, 

70,000 _ 
5,000 


S.D.R, of the standard population would .emain the same as the C.D.R. of 
standard population. 


S.D.R. of standard population-13:4 
S.D.R. of local population — —14*0. 


Since the S.D.R. of standard popuiation is lower we conclude that the standard 
population is healthier compared to tae local population, 


illustration 11, Given the following data : 


S.D.R. (local population)= 14 per thousand. 


Age Groups Population A Population. В 
Age Death rate Age Death rate 
Composition Per '000 Composuion рег '000 
Ww X WX 
0— 5 75 25 50 30 2,250 
5-15 250 5 260 6 1,500 
15—65 900 7 630 5 4,500 
65 апа оуег 75 65 60 70 5,250 
ZW -1,v00 122 1,000 123. ZWX-11,800 
ES 
Do you consider that the two populations are almost equally healthy, because 
the general death rates are nearly the same ? (M. Com. Delhi, 1967) 


Solution. The general death rate or the crude death rate isabout the same for 
two populauons. But on tae basis of tnis we canaot say that the two populauons are 
almost equally пеайпу because of differences in age composition. For comparison М9 
have to compute ihe S.D.R. Let us take tne populauon A аз standard. "(ne С.О... 
would, therefore, oecome standardized death rate for uns роршапоп, 


SUD RS —-ZWX _ 13,800 _ 13.8 
(population B) = SW = 1,000 


S.D.R. for population A =12'2. 
S.D.R. for population 8—13 8. Е 
Hence population А is healthier compared to population B. 
B. MEDIAN 

The median by definition refers to the middle value in a distribution, 
In case of median one-half of the items in the distribution have a value 
the tize of tne median v atus өг smaller and one-half have a value the size 
of the median value ога larger. 


As distincts “from the arithmetic mean which is calculated. from 
the value of every ` item in the series, the median is what is called a position 
average. The “ierm ‘position’ refers to the place of a value in s Epor 
The place of thae median in a series is such that an equal number of ци 
lic on either side of it. For example, if the income of five persons is. 


| 
| 
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120. 150. 160. 180, then the median income would be Rs. 150. The median 
would sti]l be 150 if we change the series to 10, 25. 150. 200, 500. In 
contrast in case of arithmetic mean the change in value of a single item 
would cause the value of the mean to be changed. Median is thus the 
central value of the distribution or the value that divides the distribution 
into two equal parts.* If there are even number of items inaseries there 
fis no actual value exactly in the middle of the series and as such the 
median is indeterminate. In this case the median js arbitrarily taken to 
be half-way between the two middle items. For example. if there 
are 10 items in a series, the median position is 5*5, that is the median value 
is half-way between the value of the items that are 5th and 6th in order of 
magnitude, Thus when J is odd, the median is an actual value with the 
remainder of the series in two equa! parts on either side of it. If N is even, 
the median is a derived figure, i.e., half the sum of the two middle values, 


Calculation of Median—Individual Observations 


Stens. (i) Arrange the data in ascending or descending order of 
magnitude. (Both arrangements would give the same answer.) 


(ii) Тп a group composed of an odd number of values such as 7, add 
1 to the total number of values and divide by 2. Thus 7+1 would be 8 
which divided by 2 gives 4—the number of the value starting at either 
end of the numerically atranged groups which will be the median value. 
In a large group the same method would be followed. Ina group of 199 
Items the middle value would be 100th value. This would be determined 
199--1 


by — 39-7100. In the form of formula 


Med.1—Size of Y=" th item. 


Tllustration 12. From the following data of the weekly wages of 7 workers com- 
pute the median wage : 


Wages (in Rs.) 100 150 80 90 160 200 140. 
Solution : CALCULATION OF MEDIAN 
S. No. Wages arranged in S. No. Wages arranged in 
ascending order ascending order 

1 80 5 150 

2 9) 6 160 

3 100 7 200 

4 140 

Median=Size of wet th йет 751 =4th item. 


Size of 4th item=140. Hence median wage— Rs. 140. 


*''The median may be defined as the middlemost or central value of the variable 
when the values are artanged in order of magnitude, oras the value such that greater 
апа smaller values occur with equal frequency. Inthe case of a frequency curve the 
median may be defined as that value of the variable which divides the area of the curve 
into two equa! parts"—Y ule and Kendall. 


Т'"Тһе abbreviation Med. represents median, 
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e thus at jan is the middle most item— 
than Rs, 140 and an equal number, ѓ.е., 3 get more Gan Re aue сала ien 
The procedure for determining the median of an even-numbered 
group of items is not as obvious as above. If there were, for instance, 
different values in a group, the median is really not determinable since 
both the 5th and 6th values are in the centre. In practice, the median 
value for a group composed of an even number of items is estimated by 
finding the arithmetic mean of the two micdle values—that is, adding the 
two values in the middle and dividing by two. Expressed in the form of 
formula it amounts to : 


Median- Size ++ tlie item. 


Thus we find that it is both when N is odd as well as even that 1 has 
to be added to determine median value. 


Illustration 13. Obtain the value of median from the following data : 


391, 384, 591, 407, 672, 522, 777, 753, 2,488, 1,490. (C. A. May, 1969) 
Solution : CALCULATION OF MEDIAN 
S. No. Data arranged in S. No. Data arranged in 
ascending order ascending order 
Xx x ж 
Д 384 6 672 
2 391 7 753 
3 407 8 777 
4 522 9 1,490 
5 591 10 2,488 


Median=Size of YH th item= 4) =5'Sth item 


Size of 55th item 2th item+6th item 916m -128 ears 


Computation of Median—Discrete Series 


Steps. (i) Arrange the data in ascending or descending order of magnitude. 
(ii) Find out the cumulative frequencies. 
N+1 
2 
(iv) Now look at the cumulative frequency column and find that 
total which is either equal to кы or next higher than that and determine 


(iii) Apply the formula: Median=Size of 


the value of the variable corresponding to this, That gives the value of 
median. j 
Illustration 14. From the following data find the value of median : 


Income (Rs.) 100 150 80 200 . ` Eu E 


No. of Persons 24 26 16 20 
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Solution : CALCULATION OF MEDIAN 
Income No. of persons с. Jf. Income No. of persons cf, 
arranged in F arranged in 
ascending order ascending order 

50 16 16 180 30 96 

100 24 200 20 16 

150 26 66 250 6 122 

te Д 


Median=Size ort th item 22 gp sg, item 
Size of 61°Sth item 150 
Hence median іпсоте== Rs, 150. 
Calculation of Median— Continuous Series 


Steps. Determine the particular class in which the value of median 
lies. Use Ii as the rank ofthe median and not d 


УТА Some writers 
have suggested that while calculating median in continuous series 1 should 


added to total frequency if it is odd (say 99) and should not be added if 
ifis an even figure (say, 100). However, 
nd discrete series because Specific items 


1 is to be added in case of individual 


ng to use the above rule. Hence, 
it is + which will divide the area of the curve into two equal parts and 


as such we should use T instead of ДЕЕ) in continuous series. After 
ascertaining the class in which median lies, the following formula is used 


for detcrmining the exact value of median. 


$ N/2 - —c.f. 
Median- L- — 7 xi 
L=Lower limit of the median class, i.e. 
middle item in the distribution lies. 
с f.— Cumulative frequency of the class 


preceding the median class 
or sum of the frequencies of all classes lower than the median 
class. 


; the class їп which the 


{=Simple frequency of the median class, 
t=The class interval of the median class, 
The above formula assumes that the frequencies are cumulated start- 
ing at the smallest class value. The formula ; 


М2 —c.f. 
Med.—U — xi 
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where U refers to the upper limit of the median class is based on the 
assumption that the frequencies are cumulated starting at the largest class 
value. 

Note. It must be remembered that in interpolating the median 
value in the frequency distribution it is assumed that the variable is 
‘continuous and that there is an orderly and even distribution of items 
*within each class. 


Jllustration 15, From the following data determine the value of median т 


‘Income (Rs.) Frequency Income (Rs.) Frequency 

Below 50 1 150—170 22 
50— 70 16 170—190 15 
70— 90 39 190—210 15 
90—110 58 210—230 9 

110—130 60 Above 230 10 
130—150 46 

Solution : CALCULATION OF MEDIAN 

Income Frequency c.f. Income Frequency c.f. 
(Rs.) f (Rs.) : 

Below 50 1 1 150—170 22 242 
50— 70 16 17 170—190 15 257 
70— 90 39 56 190—210 15 272 
90 110 58 114 210—230 9 281 

110—130 60 174 Above 230 10 291 
130—150 46 220 

N=291 


Median=Size of E th item= 291 145:598 item 


2 2 
Hence median lies in the class 110—130. 
NIŻ =c f. 
Med.=L+4+—— xi 
f 
L—110,N/2— 1455, c. f.—114, f=60, i20 
145:5—114 315x20 { 
ws Мей.=110+ —e Х 20=110+ — 2120 5 


Hence median income is Rs. 120-5 
Illustration 16. Calculate median from the following data : 


Value Frequency Value Frequency 
Less thap 10 4 Less than 50 96 
"TR 16 » 5. 60 112 
э ээ 40 „ » 70 120 
» » 76 » » 125 
(C.A., Nov. 1973) 


Solution. Since cumulative frequencies are given, first find the simple frequencies. 
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CALCULATION OF MEDIAN 


Value Frequency cf. Value Frequency c.f. 

0—10 4 4 40—50 20 96 

10—20 12 16 50—60 16 112 

20—30 24 40 60—70 8 120 

30—40 36 76 70-80 ^ 125 
N2125 


Median=Size of Jt item- Size of 1256625 item 
Hence median lies in the class 30—40. 


.NI2-.— c.f. 
Median=L+ F 


L=30,|N/2 =62'5, c.f.—40, f=36, i=10. 


xi 


Median=30 + $79 1030462523625. 


Illustration 17. Compute median from the following data : 


Mid-value Frequency Mid-value Frequency 
115 6 165 60 
125 25 175 38 
135 48 185 2 
145 72 195 3 
155 116 
(C.A., 1972, 


Solution. Since we are given the mid-values, we should find out the upper and 
lower limits of the various classes. 


CALCULATION OF MEDIAN 


Class intervals y c.f. Class intervals ү» йе 
110—120 6 6 160—170 60 327 
120—130 25 31 170—180 38 365 
130—140 48 79 180—190 22 ~ 387 
140 -150 72 151 190—200 3 390 
150—160 116 267 


Med.=Size of ™ th item=Size of 30 195th item 


*. Median lies in the class 150—160. 


NI2 —c f. 
Median-L4- —-— xi 


f 
L=150, N/2 —195, c. f.—151, f=116, i=10 


(195—151) 


Median=150+ 116 


Х10=1504+-3'8= 153-8 


MEASURES OF CENTRAL VALUE E-7:25. 


Illustration 18. An incomplete distribution is given below : 


Variable Frequency 
10—20 12 
20—30 30 
30—40 ? 
+ 40—50 65 
50—60 ? 
60—70 25 
70—80 18 
Total 229 


You are given that the Median value is 46, 
. (a) Using the Median formula fill up the missing frequencies, 
(b) Calculate the Arithmetic Mean of the completed table. (C.A, 1973) 
Solution. (a) Let the frequency of the class 30—40 be Л and that of 50—60 be fas 
The total frequency is 229. 
The frequencies of the classes other than the missing Ones are 
(124-30+65+25+18)=150 


4 229—150 fi f. 
zb Ath=79 
The value of the median is also given. 
INj2 —c.f. 
Median=L+- nar Sa xi 


Median- Size of 4 th item P 114°5һ item 
Median class is 40—50. 
f geni 4165-0230) 


AISR 
SSR ay 


x10=404 ESA y 190404 D Ax 10 


or h 
Since f14-f,— 79, we can find out fy. 
fi 19—34—45 
Thus f, is 34 and f, is 45. 
1 d COMPUTATION OF ARITHMETIC MEAN Ex 


Variable f MI (m—45)/10! fe 
2 
10—20 120 15 а р 
20—30 30:0 25 zy —600 
30—40 335 35 Ел —33:5 
40—50 65:0 45 0 0 
50—60 45:5 55 i 455 
60—70 25°0 65 2 50:0 
70—80 180 75 3 540 
N=229 Xfd'—20 
RU i 
X=4+ xC 


А=45, Zfd' —20, N=229, C=10 
X=45+ 20 x 10-45-08; 458 


CEPI SIR (у: 
* Cumulative frequency of the class preceding the median class is (12+30+/,), 
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"Calculation of Median when Class Intervals are Unequal 


When the class intervals are unequal, the frequencies need not be 
adjusted to make the-class intervals equal and the same formula for inter- 
polation can be applied as discussed above. 


Illustration 19, Calculate median from the following data : 


Marks 0—10 10—30 30—60 60—80 80—90 
No. of Students 5 15 30 8 2 
Solution : CALCULATION OF MEDIAN 

Marks f ch Marks т c.f. 
0—10 5 5 60—80 8 58 
10—30 15 20 80—90 2 60 
30—60 30 50 


Med.=Size of Уы iteme f —30їһ item 

Median lies in the class 30—60, 
N/2 —c.f. 

Med.=L+ > 

т 
L=30, N/2=30, c.f.—20, f=30, 1=30 


Med.— -3 P x 30= 30--10—40 


xi 


1f we make the class intervals equal, the same answer will be obtained. 


Marks f c.f. Marks f ef. 
0—10 5 5 50—60 10 = sO) 
10—20 TS 12:5 60—70 4 54 
20—30 75 20 70—80 4 58 
30 40 10 30 80—90 2 60 
40—50 10 40 
Loon cL PRO ЫРАА ааа с TP с ул кэёле ШЕШ 

Med.-Si of © th item=S?=30th item 

Median lies in the class 30—40. 
№2 —с./. 
= i 
Med.=L+ —>— x 
L«30JNI2 =30, c.f.=20, f=10, i=10 
Med.=30 + 29529 10=30+410=40. 


Mathematical Property of Median 


The sum of the deviations of the items from median, ignoring signs, 
is the least, For example, the median of 4, 6, 8, 10, 12158. The 
deviations from 8 ignoring signs are 4, 2, 0, 2, 4 ‘and the total is 12. This 
total is smaller than the one obtained if deviations are taken from any 

-other value. Thus if deviations are taken from 7, the values ignoring signs 
"would be 3, 1, 1, 3, 5 and the total 13. 
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Merits and Limitations of Median 


Merits. 1. It is especially useful in case of open-end classes since only: 
the position and not the values of items must be known. The median is 
also recommended if the distribution has unequal classes, since it is much 
easier to compute than the mean. 


2. It is not influenced by the magnitude of extreme deviations from, 
it. For example, the median of 10, 20, 30, 40 and 150 would be 30 where- 
as the mean 50. Hence very often when extreme values are present in a 
set of observations, the median is a more satisfactory measure of the- 
central tendency than the mean. 


3. In markedly skewed distributions such as income distributions 
or price distributions where the arithmetic mean would be distorted by 
extreme values the median is especially useful. Consequently the median. 
income, for some purposes, be regarded as a more representative figure 
for half the income earners must be receiving at least the median income. 
One can say as many receive the median income as do not. 


+. It is the most appropriate average in dealing with qualitative- 
data, i.e., where ranks are given or there are other types of items that are 
not counted or measured but are scored. 


5. The value of median can be determined graphically whereas. 
the value of mean cannot be graphically ascertained. 

6. Perhaps the greatest advantage of median is, however, the fact. 
that the median actually does indicate what many people incorrectly 
believe the arithmetic mean indicates. The median indicates the value of~ 
the middle item in the distribution. This is a clear-cut meaning and makes 
the median a measure that can be easily explained, 


Limitations. 1. For calculating median it is necessary to arrange. 
the data ; other averages do not need any arrangement. 


2. Since it is a positional average, its value is not determined by. 
each and every observation. 

3. Itis not capable of algebraic treatment. For example, median, 
cannot be used for determining the combined median of two or more groups. 
as is possible in case of mean. Similarly the median wage of a skewed: 
distribution times the number of workers will not give the total payroll. 
Because of this limitation the median is much less popular compared to the. 
arithmetic mean. i 

4. The value of median is affected more by sampling fluctuations. 
than the va!ue of arithmetic mean. ; 

5. The median in some cases cannot be computed exactly as сап, 
the mean. When the number of items included in a series of data is even, 
' the median is determined approximately as.the mid-point of the two. 
middle items. 

6. Itis erratic if the number of items is small. 


Usefulness. The median is useful for distributions containing. 
open-end intervals since these intervals do not enter into its computation. . 
Also since the median is affected by the number rather than the size of” 
items, it is frequently used instead of the. mean as a measure of centrab 
tendency in cases where such’ values are likely to distort the mean, 
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Related Positional Measures 


Besides median, there are other measures which divide a series into 
*qual parts. Important amongst these are quartiles, deciles and percen- 
tiles. Quartiles are those valnes of the veriate which divide the total 
frequency into four equal parts, deciles divide the total frequency 
in 10 equal parts and the percentiles divide the total frequency in 100 
-equal parts. Just as one Point diviaes a series into two parts, three 
points would divide it into four parts, 9 points into 10 parts and 99 points 
into 100 parts, Consequently there are only 3 quartiles, 9 deciles and 99 
percentiles for a series. The quartiles are denoted by symbol Q, deciles 
'by D and percentiles by P. The subscripts 1, 2, 3, etc., beneath Q, D, etc., 
would refer to the Particular value that we want to compute, Thus Q, 
would denote first quartile, Q, second quartile, Q, third quartile, D, first 
decile, D, 8th decile, P, first percentile and Ре 60th percenule, etc. 


Graphically any set of these partition values divides the area of the 
frequency curve or histogram into equal parts. If vertical lines are drawn 
at third quartiles, for example, the arca of the histogram will be divided 
by these lines into four equal parts. The 9 deciles divide the area ot the 
histogram or frequency curve into 10 equal parts and the 99 percentiles 
"divide the area into 100 equal parts. 


In economics and business quartiles are more widely used than 
deciles and. percentiles. The quartiles are the points on the X-scale that 
divide the distribution into four equal parts, Obviously there are three 
“quartiles, the second coinciding with the median. More precisely stated, 
the lower quartile, Q, is that point on the X-scale such that one-fourth of 


‘the total frequency is less than Q, and three-fourths is greater than Q,, ' 


he upper quartile, Q, is that point on the X-scale sucn that three-tourth 
-of the total frequency is below Q, and one-fourth is above it. 


The deciles and percentiles are important in psychological and 
educational statistics Concerning grades, rates, cores and ranks ; they are 
-of use in economics and business statistics in personnel work, productivity 
"ratings and other such situations. 


It should be noted that quartiles, deciles, etc., are not averages, they 
"are measures of dispersion and аз such shall be discussed in detail in the 
next chapter. Here only a passing reference is given. The method of 
"computing these partition values is the same as discussed for median, 

Just as quartiles divide the series into 4 equal parts, quantiles divide 
it into 5 equal parts, septiles into 7 equal parts and ocules into g equal 
parts. However, these partition values are rarely used in practice, 


“Computation of Quartiles, Deciles, Percentiles, etc, 


The Procedure for computing quartiles, deciles, etc., is the Same as 
the median. While computing these values in individual and discrete 
series we add 1 to N whereas in continuous series we do not add 1. 


Thus Q,—Size of lu, item (in individual observations and discrete 


Qi—Size of - th item (in continuous series.) 


——- ne 
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Q,—Size of 3 (= "n item (in individual and discrete series.) 
Q,—Size of 3 (2 )® item (in continuous series.) 

e "A item (in individual and discrete series.) 


D,-—Size of 4 Т0, i )® item (in continuous series.) 


[1 


Peo=Size of 60 (Fie item (in individual and discrete series.) 


P= Size of 60 Ga je item (in continuous series.) 


Illustration 20. From the following data compute the value of upper and lower 
quartile, D;, P, and Pro: 


Marks Below 10 10—20 20-40 40—60 60—80 meu 


No. of Students 8 10 22 25 10 
Solution : CALCULATION OF VARIOUS PARTITION VALUES 
Marks No. of students ef, 
f 

Below 10 8 8 
10—20 10 18 
20-40 22 40 
40—60 25 65 
60—80 10 75 

Above 80 5 80 

N=80 


Qi=Size of Zth йет 80 —20th item. 
Hence Q; lies in the class 20—40. 


№4 —c.f. 
Qi—L+ F xi 


L=20, N/A —20, c.f.—18, f=22, i20 


(20—18) 
о MAE d 


Q;-Size of. EET item=Size of- 


Hence Q, lies in the class 40—60 


3NIA —c.f. 
Q=L+ 7 хі 


х20=204-1:8=21:8 
3x80 


=60th item. 


L=40, 3N/4 =60, c.f.—40, f=25, i=20; 


tS 23=404+ 09 x20—40+16=56 
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$ ЖОПЫ ЖМ 
D,=Size of th item=Size о 280 "net item. 


Hence Dj lies in the class 10—20. 
2N[10/—c f. 
Dy=L+ F — xi Ж 


kn 


L=10, 2N/10- 16, c.f.—8, f=10, i=10, 


Dy=10+ 16-5 10-10+8-18 


$ 5N Y 5x80 ^ 
P,=Size of 100 th item= ӨТ =4th item 


Hence Р, lies in the class 0—10. [Below 10 means the lower limit is 0.] 
5N1100—c f. 
PjLE — 3775 xi 
L—0.5N/100:4, c f.=0, f=8, i=10 


-0 
Р,=0+ Ax 10=0+5=5 


(It should be noted that if the quartile, decile, etc., lie in the first class, then the. 
cumulative frequency of the Preceding class shall be taken to be zero). 
90х80 
100 


90, 
Po=Size of e th item =72th item 
Hence Р» lies in the class 60—80. 
'90NI100—c f. 


Pig= L ARES yp 


L=60,190N/100-72, f= 10, c.f.—65, i=20, 


xi 


Pro =60-+ BS x20=60+14=74 


Determination of Median, Quartiles, etc., Graphically 


Median can be determined graphically* by applying any of the: 
following two methods : З 

1. Draw two ogives—one by ‘less than’ method and other by ‘more- 
than’ method. From the point where both these curves intersect each other 
draw a perpendicular on the X-axis. The point where this perpendicular- 
touches the X-axis gives the value of median.” 

2. Draw only one ogive by ‘less than’ method. Take the variable- 
on the X-axis and frequency on the Y-axis, Determine the median value- 


7 
by the formula : median=Size of 4 th item. Locate this value on the. 


*The value obtained graphically will be the same as obtained algebraically- 
except for errors in plotting and reading the scale. 2 
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Y-axis and from it draw a perpendicular on the cumulative frequency 
curve. From the point where it meets the ogive draw another perpendi- 
cular on the X-axis and the point where it meets the X-axis is the median. 

The other partition values like .quartiles, deciles, etc., can also be 
determined graphically by following method No. 2. 


Illustration 21. Determine the median wage graphically from the following 


data : 
Wages No. of Wages No. of 
(їп Rs.) workers (n Rs.) workers 
20—40 + 100—120 12 
40—60 6 120—140 7 
60—80 10 140—160 3 
80—100 16 


Solution : Method 1. Draw two cumulative frequency curves—one by the ‘less 
than’ method and another by the ‘more than’ method. From the point where both these 
curves meet draw a perpendicular on the X-axis and the point where it meets the X-axis 
is the median. 


Wages No. of workers Wages No. of workers 
less than more than 
Rs. (Rs.) 
40 4 20 58 
60 10 40 54 
80 20 60 48 
100 36 50 38 
120 48 100 23 
140 55 120 10 
160 58 14) 3 
160 0 


Method 2. If we draw only one ogive, say, by the ‘less than’ me:hod, we can 
also determine the value of median from it. This is shown by the following graph : 


Med.=Size of Baron item. 


SME—10°77-12 
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Take 29 on the Y-axis and draw a perpendicular on the ogive. From the point 
where it meets the ogive draw another perpendicular on the X-axis, 


LOCATING MEDIAN GRAPHICALLY 


He:3025. WAGES (Rs.) 


It is clear from the above graph that the median wage is Rs. 91:25, 


Illustration 22, Draw an ogive for the following distribution. Read the median 
from the graph, and verify the result by calculation. 


How many workers earned wages 
between Rs. 60 and Rs, 72 ? 
Weekly wages No. of workers Weekly wages No. of workers 
(in Rupees) (in Rupees) 
50—55 6 70—75 16 
55- 60 10 75—80 12 
60- 65 22 80—100 15 
65—70 30 (B. Com., Delhi, 1972) 
Solution : 
Wages less than _ No. of workers Wages less than No. of workers 
—(Rs.) (Ағ) Р 
NUTS E ET WIEN 54 
60 16 80 96 
65 38 100 11 
70 68 


th item — Mt 55-5 item 


Median=Size of x 


Lies 
IHE 
BE oso 
Zoe Ste he 


Graphically the value of median= 67-9 or 65. 
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Number of workers whose weekly wages are between Rs. 60 and Rs. 72 is 57 
as shown below : i 
Number of workers whose wages are less than Rs, 72—73 
» » » ” » »  60—16 
„ a "T 9 between 60 апа 72—(73— 16)—57 


CALCULATION OF MEDIAN 


Weekly wages No. of workers e. Weekly wages No. of workers c.f. 
(in Rs.) (Rs.) 
50—55 6 6 70—75 16 84 
55—60 10 16 75—80 12 96 
60—65 22 38 80—100 15 ill 
65—70 30 68 


Median=Size of -y th item. = I sss item 


Hence median lies in the class 65—70. 
N 
EETA 
Median L4-—————— xi 
f 
N 
L=65, TE 2555, c.f.—38, f—30, i=5 


Median- 654. 292735 5. 6515.92 6792 


The value of median is thus the same by both mathematical and graphic methods. 

Mlustration 23. You are given the net profits earned by some companies, 

RA РУ the percentage of companies getting profits between Rs 25,000 and 
5. 75,000, 


Profits in Rs. No. of companies Profits in Rs. No. of companies 
10,000—20,000 15 60,000—70,000 22 
20,000 -30,000 35 70,060—80,000 12 
30,000—40,000 47 $0,060.— 90,000 11 
40,000 - 50,000 58 90,000—-1,00,000. 8 
50,000—60,000 33 


(C.A., May, 1972) 


Solution : Finding percentages fom the given data : 


Profits less No. of Percentage Profits less No.cf Percentage 
than companies of compenics than companies of companies 
a РН ИСНИ ИЕ ry 

Rs. 20,000 15 60 Rs. 70,000 219 876 

» 32,000 50 20-0 » 80,000 231 924 

» 40,000 97 38:3 » 80,00 242 968 

» 50,000 165 56-0 » L00 060 250 100-0 


» $60,000 197 TRS 
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Plotting the data on the graph paper. 


PERCENTAGE OF COMPANIES EARNING PROFITS BETWEEN Rs. 25000- Rs. 75000 
100; 


The graph shows clearly that the percentage of companies earning profits less 
than Rs. 75,000 is 92 and the percentage of ccmpanies earning prcfits less than Rs. 25,060 
is 13. Thus the percentage of companies makirg prcfits between Rs 25,CCO ard 
Rs, 75,0C0 is (92—13) —79. 


C. MODE* 


The mode or the modal value is that value in a series of. observaticns 
which occurs with the greatest fjequercy. For exemple the mode oi the 
series 3, 5, 8, 5, 4, 5, 9, 3 would be 5, since this value occurs more 
frequen tly than any oi the others. Ifa graph of the distribution is ayail- 
able mode is readily ascertainable as the abscissa of the highest point of the 
distribution curve. 


The mode is often said to be the value which occurs most often, that 
is, with the highest frequency. While this statement is quite helpful in 
interpreting the mcde, it cannot safely be applied to any distribution, 
because of the vagaries of sampling. Even fairly large samples drawn from 
a statistical population with a single well-defined mcde may exhibit very 
erratic fluctuations in this average if the mode is defined as that exact value 
in the ungrouped data of each sample which occurs most frequently. Rather 
it should be thought as the value about which the items are most closely 
concentrated, It is the value which has the greatest frequency density i» 
its immediate neighbourhood}. For this reason mode is also called the most 
typical cr fashionable value of a distribution. 


*The mode is that value of a series which appears more frequently than an: 
other. Thus it is found by discovering the value that appears most fequenti? 


fAlve M. Tuttle : Elementary Business and Economic Statistics.. 
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The following diagram shows the modal value: 


The value of the variable at which the curve reaches a maximum i$ 
called the mode. It is the valuc around which the items tend to be most 
heavily concentrated. 

Although mode is that value which occurs most frequently it does not 
follow that its frequency represents a majority out of ail thé total number 
of frequencies. For example, in the election of coliege president the votes 
obtained by three candidates contesting for presidentship out of a total of 
816 votes polled are as follows : 


Mr. X 263; Mr.Y278; Мг. 2 270; Total 816. 


Mr. Y will be elected as president because he has obtained highest 
votes. But it will be wrong to say that he represents majority because there 
are more votes against him (2684-270-538) than those for him. 


There are many, situations in which arithmetic mean and median 
fail to reveal the true characteristic of data. For example, when we 
talk of most common wage, most common income, most common 
height, most common size of shoe or ready-made garments we 
have in mind mode ard not the arithmetic mean or median discussed 
earlier. The mean does not always provide an accurate reflection of the 
data due to the presence of extreme items. Median may also prove to be 
quite unrepresentative of the data owing to ап uneven distribution of the 
series, For example, the values in the lower half of a distribution range 
from, say, Rs. 10 to Rs. 100, while the same number of items in the upper 
half of the series range irom Rs. 100 to-Rs. 6,000 with most of them near 
the higher limit. In such à distribution the median value of Rs. 100 will 
provide little indication of the true nature of tiic data. 

Both these shortcomings may be overcome by the use of mode which 
refers to the value which occurs most frequentiy in a distribution. Moreover, 
mode is the easiest to compute since it is the value corresponding to the 
highest frequency. For example, if the data аге: 


Size of shoes 5 6 7 8 9 10 11 
` No. of persons 10 20 25 40 22 15 6 


the modal size is ‘8’ since it appears maximum number of times in ‘the 
series. 
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Calculation of Mode 


Determining the precise value of the mode of a frequency distribution 
is by no means an elementary calculation. Essentially it involves fitting 
mathematically some appropriate type of frequency curve to the grouped 
data and the determination of the value on the X-axis below the peak of 
the curve. However, there are serveral elementary methods of estimating 
the mode. These methods have been disscussed for individual observations, 
discrete series and continuous series, 


Calculation of Mode—Individual Observations 


For determining mode count the number of times the various values, 
repeat themselves and the value occurs the maximum number of times is 
the modal value. The more often the modal value appears relatively, the 
more valuable the measure is an average to represent data. 


Tilustration 24. Calculate the mode from the following data of the Marks 
obtained by 10 students : 


S. No. Marks obtained 5. No. Marks obtained 
1 10 6 27 
2 27 7 20 
3 24 8 18 
4 12 9 15 
5 27 10 30 
(B. Com., Mysore, 1973) 
Solution : CALCULATION OF MODE 
Size of item Number of times it occurs Size of item Number of times it occurs 
10 1 20 1 
12 1 24 1 
15 1 27. 3 
1 


18 30 1 
Total 10 


Since the item 27 occurs the maximum number of times, ie, 3 hence the modal 
marks are 27, 1 


‚Мое. Thus the process of Cetermining mode in case of individual observations 
essentially involves grouping of data, 


Wher there are two or more values having the same maximum fre- 
quency, ove cannot say which is the modal value and hence mode is said 
to be ill-defined. Such a series is also known as bi-modal cr multi: modal. 
For example, observe the following data : 

Income (in Rs.) 110, 120, 130, 120, 110, 140, 130, 120, 130, 140. 


Size of item 110 120 130 140 
No. of times it occurs 2 3 3 2 


. 4. Since 120 and 130 have the same maximum frequency, i.e., 3, mode is ill-defined 
in this case. 


Calculation of Mode— Discrete Series 


In discrete series quite often mode сап be determined just by in- 
spection, t.e., by looking to that value of the variable around which the 
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items are most heavily concentrated. For example, observe the following 
data : : 
Size of garment 28 29 30 31 32 33 
No. «f persons wearing : 10 20 40 65 50 15 


From the above data we can clearly say that the modal size is 31 
because the value 31 has occurred the maximum number of times, i.e., 65. 
However, where the mcde is determined just by inspection, an error of 
judgment is possible in those cases where the difference between the 
maximum frequency and the frequency preceding it or succeeding it is 
very small and the items are heavily concentrated on either side. In such 
cases it is desirable to prepare a grouping table and an analysis table. 
These tables help us in ascertaining the modu! class. 


A grouping table has six columns. In column 1 the maximum 
frequency is marked or put in a circle ; in column 2 frequencies are group- 
ed in two's ; in column 3 leave the first frequency and then group the 
remaining in two’s;in column 4 group the frequencies in three's; in 
column 5 leave the first frequency and group the remaining in three's ; 
and in column 6 leave the first two frequencies and then group the 
remaining in three’s In each of these cases take the maximum total and 


mark it in a circle or by bold type. 

After preparing the grouping table, prepare an analysis table. "While 
preparing this table put column. numbers on the left-hand side and the 
various probable values of mode on the right-hand side. The values against 
which frequencies are the heighest are marked in the grouping table and 
then entered by m-ans of a bar in the relevant ‘box’ corresponding to the 


values they represent. 
The procedure of preparing grouping table and analysis table shall 
be clear from the following example : & 


Illustration 25. From the following data of the height of 100 persons in a com- 
mercial concern determine the modal height : 


Heightin inches: 58 60 61 65 63 64 65 66 68 70 


No. of persons : 4 6 5 10 20.7722 24 6 2 1 
. 

Solution : GROUPING TABLE 

ЕАУ 7 ЕА 0 Зра анаа а in GS. 
Height in Frequency ` 
inches Col. 1 Col. 2 Col. 3 Col.4 , Col.5 Col. 6 

58 4 1 
60 6 10 - т 
61 5 15 i 1.21 1 
62 10 30 J | 35 
63 20 ! ^42 1 52 | 
64 22 46 | 66 
65 24 30 1 1 52 
66 6 $ 2 J 
68 2 345) | 9 
70 1 ) 
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ANALYSIS TABLE 


Heihht in Inches 

Col. No. 58 60 61 62 63 я 65 6 68 70 
Oe ÉL 

1 1 

п 1 1 

Ш 1 1 

iv 1 1 1 

M 1 1 1 

VI 1 1 1 
——MM— HM e 

Total 1 3 5 4 1 


Since the value 64 has occurred the maximum number of times, i.e., 5, the modal 
height is 64 inches. It should be noted that by inspection one is likely to say that the 
modal value is 65 since it occurs the maximum number of times, ie. 24. But this is 
incorrect as revealed by the analysis table and grouping table. 


Calculation of Mode— Continuous Series 
Steps: (i) By preparing grouping table and analysis table or by 
inspection ascertain the modal class. 
(3i) Determine the value of mode by applying the following formula : 
Mo=L+( ASA xi (5) 


where, L-Lower limit of the modal class ; A =the difference between 
the frequency of the modal class and the frequency of the pre-modal class, 
ie, preceding class (ignoring signs) ; A.=the difference between the 
frequency of the modal class and the frequency of the post-modal class, i.e., 
succeeding class (ignoring signs) ; i=the class-interval of the modal class. 


Another form of this formula is : 


Л—Л ; 

Mo=L+ - x 

s 2 2f—fo— fa H 

where, L=Lower limit of the modal class ; f,= frequency of the modal 


class; f,=frequency of the class preceding the modal class ; fa frequency 
of the class succeeding the modal class. 


While applying the above formula for calculating mode, it is 
necessary to see that the class intervals ате uniform throughout. If they 
are unequal they should first be made equal on the assumption that the 
frequencies are equally distributed throughout the class, otherwise we will 
get misleading results. * 


There may be two values which occur with equal frequency. The 
distribution is then called bimodal.* The following isa graph of bimodal 
distribution : 


* There may be more than two values having the same highest frequency. Such 
vdistribution is called multimodal. 
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In a bimodal distribution the value of mode cannot be determined by 
the formula given on the previous page. If plotted data produce a bimodal 


distribution, the data themselves should be questioned. Quite often such a 
condition is caused when the size of the sample is small; the difficulty 
can be remedied by increasing the sample size. Another common cause 
is the use of non-homogeneous data. [n instances where a distribution 
is bimodal and nothing can be done to change it, the mode should not be 
used as a measure of central tendency. 


Where mode is ill-defined, its value may be ascertained by the 
following formula based upon the relationship between mean, median and 
mode : 

Mode=3 Median—2 Mean ИИС 

This measure is called the empirical mode. 

Illustration 26. Calculate mode from the following data : 


Marks Мо. of students Marks No. of students 
Above 0 80 Above 60 28 
ERO 77 i 10 16 
МЕНА... 72 „ 80 10 
» 30 65 ad. 90 8 
» 40 55 „100 0 
» 50 43 


(B. Com., Andhra, 1972) 


Solution. Since this is a cumulative frequency distribution, we first convert it 
into a simple frequency distribution. 


Marks No. of students Marks No. of students 
m NES ee 
0—10 4 50—60 15 
10—20 5 60—70 12 
20—30 7 70—80 6 
30—40 10 80 - 90 2 
40—50 12 90—100 8 


By inspection the medal class is 50—60, 


А1 
Hosbt SAL 


1=50; Аүт=(15—12)=3; A, (15—12)—3 ;i-10 


xi 


3 
Mosspt- х10=50+5=55. 
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Illustration 27. Compute mode from the following data : 


Size Frequency Size Fregnency 
Below 10 5 Below 50 190 
» 20 20 » 60 250 
d 90 50 wl Т0 295 
», 40 105 » 80 320 
(B. Com. Marathwada, 1972) 
Solution : CALCULATION OF MODE 
раа danni. CNN NNNM 
Size Frequency Size Frequency 
0—10 5 40 50 85 
10—20 15 50—60 60 
20—30 30 60—70 45 
30—40 55 70—80 25 
By inspection the mode lies in the class 40—50. 
А1 
fol ope лута xi 
L=40, Ai=(85—55)=30, Ao(8S—60)=25, i=10 
30 300 + ‹ 
Мо=40+ 5125 10=40+—> 7404-545 —45:45, 


Illustration 28. From the following data of the weight of 122 persons determine 
the modal weight : 


Weight (in lb.) No. of persons Weight (in lb.) No. of persons 
100 — 110 4 140—150 33 
110—120 6 150 - 160 17 
120—130 20 160—170 8 
130—140 32 170—180 2 


Solution. By inspection it is difficult to say which is the modal class. Hence 
we prepare a grouping table and an analysis table. 
GROUPING TABLE 


Weight No. of persons 
in lb. Col. 1 Col. 2 Col. 3 Coi. 4 Col. 5 Col. 6 


100—110 
110—120 ee ie 
120—130 1 58 71 
130—140 e 1 | 85 
140 = 150 82 1] 
—160 
160—170 d 58 ET 
170 180 J 
ANALYSiS TABLE 
Class in which mode is expected to lie* 
Col. No. 120—130 130— 140 140—150 
I ү танин UE. с. 
п 1 1 
Ш 1 1 
IV [ 1 
у 1 1 1 
VI 1 1 
Total 3 5 5 


“While preparing analysis table all the classes need not be written—only those 
classes be considered in which mode is expected to lie, Le. why classes 100—110, 
150—160, etc., are left. 
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Thisis a bi-modal series. Hence mode has to be determined indirectly by- 


applying the formula : 
Mode=3 Median - 2 Mean. 


Median Size of th item Size of 122 61st item 
Hence median lies in the class 130—140. 
N 
7-6 
Median=L+——— ~~. xi 
T 
L=130, Xa, с/.=30, f=32, i= 10. 


Median=130+ 199) 10=130+ 30 = 13969. 


Calculation of Mean : 


Weight in Ib. No. of persons 
f m a' fd’ 
100—110 4 105 =3 —12 
110 120 6 115 —2 —12 
120—130 20 125 =] —20 
130 - 140 32 135 0 0 
140—150 33 145 1 33 
150— 160 17 155 2 34 
160 —170 8 165 3 24 
170—180 2 .175 4 8 


ZEfa'—55 


A=135, Zfd' —55, N=122, C=10 
X-1354- S xi02 135451139551. 


Mode=3 Median—2 Mean. 
Mode=(3 x 139:69) — (2 x 139:51)—419:07—279:02— 14005, 
Hence modal weight is 140-05 Ib. 
Mode when Class Intervals are Unequal 
The formula for calculating the value of mode given above is. 
applicable only where there are equal class intervals. If the class intervals 
are unequal, then we must make them equal belore we start computing 
the vaiue of mode. The class interval should be made equal and frequencies 
adjusted on the assumption that they are equally distributed throughout 
the class. 
Illustration 29. Calculate mode from the following data : 
Marks 0—10 10—20 20—40 40—50 50—70 
Frequency 5 15 40 32 28 
Solution. Since the class intervals are unequal we must adjust the frequencies 
and make the class intervals equal. 


Marks $ Marks T2 

0—10 5 40—50 32 
10—20 15 50 - 60 14 
20-30 20 60—70 14 


30— 40 20 
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By inspection mode lies in the class 40—50. 


L=40, ^1—(32—20)—12, Ay=(32-14)=18, i=10 
Mo- 404,15 x10—044 
If without.adjustment we calculate mode then the value will be 


L=20, A1—(40—15)—25, As (40—32)—-8, i10 
Мо=20+ 5х 10—20--7 627-6 


But this is not possible since mode cannot be less than 40. 
Locating Mode Graphically 


In a frequency distribution the value of mode can also be determined 
graphically. The steps in calculation азе: 


1. Draw a histogram of the given data. 

2. Draw two lines diagonally in the inside of the modal class bar, 
starting from each upper corner of the bar to the upper corner of the 
adjacent bar. 

3. Draw a perpendicular line from the intersection of the two 
diagonal lines to the X-axis (horizontal scale) which gives us the modal 
value. 

Illustration 30. The monthly profits їп rupees of 100 Shops are distributed as 

ffollows : 
Profits per shop No. of shops Profits per shop No. of shops 
0 -100 12 300—400 20 
100—200 18 400—500 17 
200—300 27 500—600 6 


Draw.a histogram of the data and thence find the modal value. Check this 
value by direct calculation. (.C.W.A., 1973) 


Solution : 


PROFITS (IN Rs.) 


MEASURES OF CENTRAL VALUE E-742 


Direct calculation : 
Mode lies in the class 200—3С0. 


al Ai 
Mode: L+ (521 


Where L=200, A1=(27—18)=9, As (2720), (100 
Mo =200+ 557. x100=200+56-25—=256'25 


Mode can also be determined from a frequency polygon in which 
case a perpendicular is drawn on the base from the apex of the polygom 
and the point where it me:ts the base gives the modal value. 

However, graphic method of determining mode can be used only 
where there is one class containing the highest frequency. If two or more: 
classes have the same highest frequency, mode cannot be determined 
graphically. For example, for the data given below mode cannot be " 


graphically ascertained. 


xi 


Size of No. of persons Size of No. of persons 
shoe shoe 
2-4 10 8—10 8 
4—6 15 10—12 2 
6—8 15 


Merits and Limitations of Mode 

Merits. 1. By definition mode is the most typical or representative 
value of a distribution. Hence, when we talk of modal wage, modal size: 
of shoe or modal size of family it is this average that we refer to. The: 
mode is another measure which actually does indicate what many people 
incorrectly believe the arithmetic mean indicates; The mode’ is the most 
frequently occurring value. If the modal wage in а factory is Rs. 116 then. 
more workers receive Rs. 116 than any other wage. This is what many 
believe the "average" wage always indicates, but actually such a meaning. 
is indicated only if the average used is the mode, 

2. It is not affected by extremely larg: or small items. For 
example, the mode of values 1, 4, 4 and 10 is 4 and the mode of values 1. 
4, 4 and 1,000 is also 4. 

3. Its value can be determined їп open-end distributions. without 
ascertaining the class limits, 

4. It can be used to describe qualitative phenomenon. For example, 
if we want to compare the consumer preferences for different types of 
products, say, soap, toothpastes, etc., or different media of advertising we 
should compare the modal preferences expressed by different groups of 
people. 

5. The value of mode can also be determined graphically whereas. 
the value of mean cannot be graphically ascertained. 


Limitations. The important limitations of this average аге : 

1. The value of mode cannot always be determined. In some cases 
we may have a bimodal or multimodal series. 

2. Itis not capable of algebraic manipulations. For example, from 
the modes of two sets of data we cannot calculate the overall mode of the 
combined data. Similarly the modal wage times the number of workers 
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will not give the total payroll—except, of course, when the distribution is 
normal and then the mean, median and mode are all equal. 


3. The value of mode is not based on each and every item of the 
series. 


4. It is not a rigidly defined measure. There are several formulae 
for calculating the mode, all of which usuall give somewhat different 
answers. In fact, mode is the most unstable average and its true value is 
difficult to determine, 


5. While dealing with quantitative data, the disadvantages of the 
mode outweigh its good features and hence it is scldom used, 


Usefulness. The mode is employed when the most typical value 

Of a distribution is desired. It is the most meaningful measure of central 

tendency in the case of. highly skewed or non-normal distributions, as it 
» then provides the best indication of the point of heaviest concentration, 


Relationship among Mean, Median and Mode 


A distribution in which the values of mean, median and mode 
coincide is known as a symmetrical distribution. Conversely stated when 
the values of mean, median and mode are not equal the distribution is 
known as asymmetrical or skewed. In moderately skewed or asymmetri- 
«al distributions a very important relationship exists among mean, median 
and mode. In such distributions the distance between the mean and the 
median is about one-third the distance between the mean and the mode as 
will be clear from the following diagram : 


ELATIONSHIP OF ARITHMETIC MEAN, ‘DIAN. 
AND MQDE ros 


DIVIDES AREA 
IN HALVES 


CENTRE OF 
GRAVITY 


Karl Pearson has expressed this relationship as follows : 
Mode=Mean~3 {Mean—Median] 

or Mode=3 Median—2 Mean 

and Median— Mode +4 [Mean— Mode] 


If we know any of the two values out of the th 


Ч 4 À ree, we can compute 
the third from these relationships. The following example will шаа 
this point : 


Illustration 31. Ina moderately asymmeicical distributi 
re 32:1 and 35°4 respectively. Calculate the Median. Б, SE cone Don, 1068) 
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Solution : Mode—3 Median—2 Mean 
Mode—32:1 and Mean—354 
Substituting the values 32:1—3 median—2(35:4)—.3 Median—70:8 
—3 Мейіап= —32:1—70:8-- —102:9 
Median=343 
D. GEOMETRIC MEAN 


Geometric mean is defined as the nth root of the product of № items 
or values. If there are two items, we take the Square root; if there are 
three items, the cube root ; and so on. Symbolically, 


С.М.=у (Xi) X (Xp) x (X5) Xen X, 
where Ху, X,, Ху, etc. refer to the various items of the series, 


When the number of items is three or more the task of multiplying 
the numbers and of extracting the root becomes excessively difficult. To 
simplify calculations logarithms are used. Geometric mean then is calcu- 
lated as follows : 


log G.M. PE Ёз Hon лу... log X. 


ZlogX 
N 


ог bg GM= G.M.—Antilog > 10 X 


In discrete series G.M.=Antilog aie 


In continuous series G.M.=Antilog үө 


Properties of Geometric Mean 

The following are two important mathematical properties of 
geometric mean : 

1. The product of the values of series will remain unchanged when 


the value of geometric mean is substituted for each individual value. For: 


example, the geometric mean for series 2, 4, 8 is 4 ; therefore, we have 
2х4х8=64=1х1х4 
2. Тһе sum of the deviations of the logarithms of the original, 


Observations above or below the logarithm of the geometric mean is 
equal, ie, (log z-log G:—0. This also means that the value of the 
geometric mean is such as to balance the ratio deviations of the observa- 


tions from it. Thus, using the same previous numbers, we find ihat 


GG) 


Because of this property, this average is especially adapted to average 
ratios, rates of change, and logarithmically distributed series. 


Calculation of Geometric Mean—Individual Observations 
G.M.—Antilog m 
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Steps: (i) Take the logarithms of the variable X and obtain the 
total X log X. { 
(ii) Divide E log X by N`and take the antilog ofthe value so 
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‘obtained. This gives the value of geometric mean. | 


Illustration 32. Calculate the geometric mean of the following 7 


2574, 475, 5, 078, 0:08, 0:005, 0:0009. 
dc Sid n M.A. Econ., Agra, 1974) 


(B.i Ccm., Mysore, 1967 . 


Solution : ^ CALCULATION OF GEOMETRIC MEAN 

X log X э $ log X 
2574 3 4106 08 179031 
475 2:6767 0:08 2:9031 
75 1-8751 0:005 3:6990 
5 06990 00009 49542 

A eam 

ate У log X—2:1208 


X log X 21208. "Um 
G.M. aL (2-022 = AL 228 AL 0651-1841 
Calculation of Geometric Mean—Discrete Series 
k an. =f log X 
G.M.— Antilog =“. 


Steps : (i) Find the logarithms of the variable X. 


(ii) Multiply these logarithms with the respective frequencies and 
obtain the total f log X. 


(iii) Divide E flog X by the total frequency and take the antilog of 
the value so obtained. 


Illustration 33. Calculate the geometric mean of the data given below, giving 


the number of families and the income per head of different classes of people in the 
village Borgaon : 


Income per Class of 


No. of Income per 
Class of No. of head in 1967 people families -head in 1967 
people families (Rs.) (Rs.) 
ndlords 1 1000 School teachers 3 100 
Cultivators 50 80  Shopkeepers 4 150 
Landless labourers 25 40 Carpenters 3 120 
Mone y-lenders 2 750 . Weavers 5 60 
(B. Com., Nagpur, 1973) 
Solution : CALCULATION OF GEOMETRIC MEAN 
Incom: r No. of 
Class of pecple per zo families log X flog X 
Landlord 1,000 1 3:0000 30000 
Cultivators 80 50 1:9031 951350 
Landless labourers 40 25 1 6021 400525 
Money-lenders 750 2 2 875} 5 7502 
School teachers 100 3 20000 6:0000 
Shopkeepers 150 4 21761 87044 
Carpenters 120 3 2:0792 62376 
Weavers 60 5 17782 8:8910 
Xflog X 
N=93 = 173°7907 


G.M.=AL. (RX Jea. pen 


93 jaan 1'869—73:96 
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Calculation of Geometric Mean—Continuous Series 
G.M.—Antilog T: 


Steps : (i) Find out the mid-points of the classes and take their 

logarithms. 1 
j (ii) Multiply these logarithms with the respective frequencies of 

each class and obtain the total E f log m. 

(iii) Divide the total obtained in step (ii) by the total frequency and 
take the antilog of the value so obtained. 

Illustration 34, From the following data compute the value of geometric mean ; 

Marks 0—10 10—20 20 -30 30—40 40—50 

No. of students 8 12 20 6 4 


Solution : CALCULATION OF GEOMETRIC MEAN 


Marks No. of students Mid-points log m fxlog m 
m 
0—10 8 5 0:6990 5 5920 
10—20 12 15 11761 141132 
20—30 20 25 1:3978 27:9560 
30—40 6 35 175441 9 2646 
40—50 4 45 16532 6:6128 
Eflog m 
М=50 =63'5386 


G.M.=AL. (r)a. SF SING AL 12708-1865 


Specific Uses of Geometric Mean 
Geometric mean is specially useful in the following cases : 
l. The geometric mean is used to find the average per cent increase 


in sales, production, population or other economic or business series. For 
example, from 1968 to 1970 prices increased by 5%, 10% and 18% respec- 
tively. The average annual increase is not 11% (ERA t)as given 
by the arithmetic average but 10:92; as obtained by the geometric mean. 


2. Geometric mean is theoreticelly considered to be the best average 
in the construction of index numbers. It makes the index numbers satisfy 
the time reversal test and gives equal weight to equal ratio of change. 


3. It is an average most suitable when large weights have to be 
given to small items and small weights to large items, situations which we 
usually come across in social and economic fields. 


The following examples illustrate the use of geometric mean : 


Illustration 35 (a). The price of a commodity increased by 5% from 1948 to 1949, 

18% from 1949 to 1950 and 77% from 1950 to 1951. The average increase from 1948 to 
* 1951 is quoted as 26% and not 30%. Explain and verify the arithmetic. р 

Ў (В.А. Hons. Econ., Delhi, 1959) 


SME—10°77-13 
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ч Solution, The appropriate average here is the gcometric mean and not the 
arithmetic mean. The arithmetic mean of 5,8, 77 is 30 but this is not the correct 
answer. Correct answer shall be obtained if we calculate geometric mean. 


x 
%Rise Price at the end of the year log X 
taking preceding year as 100 
105 / 20212 
8 108 2.0334 
10 177 22480 
У log Х=6`3026 
Se UP THU WA 
G.M.=A.L. Z8 X др 63026 0р -1009=126-2. 


. N ў 3 
The average increase from 1948 to 1951 
7126:2— 10026727; or approx. 26%. 


Verification. When the average rise is 30% : 2 
Rate of Total Price at the end 
Year change change of each year 
peg pe Mes NR Ac О о, уне 
I year 30% on 100 30 130:0 
П year 30% on 130 39 169 9 
Ш year 30% on 169 507 219-7 
When the average rise is 26% : 
Biten ызды” e RT CRAT a аі f 
I year 26% on 100:00 26°00 126 00 
Il year 26% on 126:00 32:76 158°76 
Ш year 26% on 158-76 41-26 200-02 


uorum WI rou „А RN. 
When the rise is of 5, 8 and 77% the changed price at the end of each усаг: 


I year ^5 on 100-0 50 1050 
П year 8% on}105-0 84 1130 
1 year 71% on 113:4 87-318 20077 


The above calculations make it clear that in the second and. third cases the price 
at the end of the third year is almost the same, the slight difference being due to 
approximation of 262 at 26. Hence, the average rise in price is 26%, 


Illustration 35 (b). Compared to the previous year the overhead expenses went 
up by 32% іп 1961 ; they increased by 40% in the next year and 50% in the following 
year. Calculate the average rate of increase in overhead expenses Over the three years. 
Explain clearly the reason for the choice of average, (С.А., 1972) 


. Solution. Іп averaging ratios and Percentages geometric mean is more appro- 
priate. Applying geometric mean here : 


% Rise Expenses at the end of the year log X 
taking Eripe year as 100 
open Rall NA cem 
32% 132 2:1206 
4495 140 21464 
50% 150 2:1761 


A Ao ie LE IL am 


Blog Х—64428 
зоча ааа ри 
: G.MmAL, PREX AL, SAR ALL, 21476-1405 
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Average rate of increase in overhead expenses 
—140:5— 100— 40-595. 
Illustration 35 (c). Find the average rate of increase in population which in the 
pez ‘or has increased by 15%, in the second decade by 2075 and in the third decade 
у o. 
Solution. Applying the geometric mean : 


X 
% Population at the end of the log X 
Rise decade taking population of 
the previous decade as 100 
15 115 2 0607 
20 120 2'0792 
30 130 2:439 


У log Х=6'2538 


G.M.-AL.( 9 Av, $2595 AL, 2:0846—1215. 


“Thus the average rate of increase in population is (121:5— 100), 21°5 per cent. 


‘Illustration 35 (d). The annual rates of growth of output of a factory in 5 years 
are 5:0, 7:5, 2:5, 5°0 and 100 percent respectively. What is the compound rate of 


growth of output per annum for the period ? (B. Com., Delhi, 1968) 
Solution. Apply the geometric mean : 
Annual rate of Ou: put relatives at the log X 
growth end of the year 

50 1050 2:0212 

75 1075 20314 

25 102°5 2:0107 

5:0 105:0 20212 

100 1100 20414 

Z iog X-10:1259 


G.M.-AL 248%. aL, 101299 Ад. 2025—1059. 


a s, he compound rate of growth of output per annum for.the period is 105:9—100 
Note. The same result shall be obtained by applying the compound interest 
formula which is discussed below. ` 


Compound Interest Formula 

Geometric mean is most frequently used in the determination of 
average per cent of change. For example. ifa city had a population of 
2,00,000 in a given year and 2,140,000 ten years later we may be interested 
in finding out'the annual per cent of change. The increase is (2,40,000 
—2,00,000), i.e., 40,000 over a period of 10 years and so one may say 
that the annual per cent increase is 2. However, if we compute 2 per cent 
increase each veo: over the preceding year the population figure turns out 
te be 2,413,800. This means that the correct figure is lightly smaller than 
2 per cent because we are actually compounding. The average annual 
per cent increase may be computed by applying the formula : 


Р„=Р,(1+" 
where, P,=The amount at the beginning of the period ; 
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P,— The amount at the end of the period ; 
r=rate of change ; 
5—number of time periods. 


It follows from the above formula that r=" $i 
9 
For the above data 2,40,000—2,00,000 (14-7)! 
Taking logarithms —5°3802=5-301+-10 log (1-+r) | 
log (14-7) —0:00792 
(1+r)=Antilog 000792 
1-4r21:0184 
r=0'0184=1 ‘84 per cent. 

The expression P,=P,(1+-r)" is called compound interest formula 
because it is extremely useful in problems involving compound interest. 
In the above case we have used it to determine average annual per cent 
of growth. However, if we know any three values of the four symbols used 
in the formula we can find out the fourth one. Thus we may determine : 

(a) Average annual per cent of change, r. 

(6) Population the given number of years later P, assuming the 
constant relative change. 

(c, Number of years, n, after which the given population will be 
attained, again assuming a constant relative change. 

(d) Population the given number of years earlier, P,, if the per cent 
of change was constant. 

It may be pointed but that the assumption of a constant relative 
change for population is not valid over extended period for any country 
except possibly “new” countries,* 

Iflustration 36. The population of a country has increased from 84 million in 
3961 to 108 million in 1971. Find the annual rate of growth of population. 

Soluti Let r be th te of th. ing the d interest 
formulas Pi, rine ene AM "— dera ш 

84 (1+r)0=108 
Taking logarithms Јов (140) 88105 Log 85, 2034. T9283. ois 
1+r=1-026 
r—0:026— 2:695. 
Merits and Limitations of Geometric Mean 


Merits. 1. Itis based on each and every item of the series. 

2. It is rigidly defined, 

3 Itisusefulin averaging ratios and percentages and in deter- 
ming rates of increase and decrease. 


4. It gives less weight to large items and more to small ones than 
does the arithmetic average. It is because of this reason that geometric 


*Croxton and Cowden— Applíed General Statistics. 
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mean is never larger than the arithmetic mean, on occasions it may turn 
out to be same as the arithmetic mean, but usually it is smaller. + 


5. Itis capable of algebraic manipulation. For example, if the 
geometric average of two or more series -and their number of items are 
known, a combined G.M. can be easily calculated by applying the formula. 

3 N, log G.M44-N, log С.М. ] 
G.M. =antilog | — 
i antilog [| N, +N; 


Limitations. 1. It is difficult to understand. 


2. Itis difficult to compute and to interpret, 

3. It cannot be computed when there are both negative and positive 
values in a series or one or more of the values is zero. 

4. The geometric mean has a very restricted application because of 
the above reasons. 


E. HARMONIC MEAN 


The harmonic mean is based on the reciprocals of the numbers 
averaged. І: is defined as the reciprocal of the arithmetic mean of the 
reciprocal of the individual observations. Thus, by definition 


Eu. (MILI 
аа е ас) 
X; Tr X, zs x, te X. 


When the number of items is large the computation of harmonic mean 
in the above manner becomes tedious. To simplify calculations ‘we obtain 
reciprocal of the various items from the table and apply the following 
formulae : 


In individual observations,'H.M. = Agile: 
SR 
N 
In discrete series, H.) Seg ЕБ 
а к N 
In continuous series, H.M.— Sera wh: 
У ( {x o ) 


Calculation of Harmonic Mean—Individual Observations 


In individual series harmonic mean is computed by applying the 
following formula : d 


H.M.— 


(xxx 
[p Xa Xs, etc., refer to the various items of the variable. 


Illustration 37. Calculate the harmonic average of the following ; 
1,05, 10, 45:0, 17570, 001, 40, 11:2 (B. Com., Mysore, 1967) 
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Solution: ^ CALCULATION OF HARMONIC MEAN 


0 1 1 

X X x w 
1 1:0000 175 0:0057 
0:5 2:0000 0:01 100-0000 
10 0:1000 40 0:2500 
45 00222 1r2 0-0893 
ху = 103-4672 


N 8 
BM Y 03-467 7:007 
(3 
Calculation of Harmonic Mean— Discrete Series 


In discrete series, harmonic mean is computed by applying the 
following formula : 


HM= —~* Eis j/ 
i) (+) 
Steps. (i) Take the reciprocal of the various items of the variable X. 


(ti) Multiply the reciprocal by respective frequencies and obtain the 
tota] (ух t ) 


(iii) Substitute the values of V and ® ( fx +) in the above formula. 


Note. Instead of first finding out the reciprocals апі then multiply. 
ing them by frequencies it will be far more easier to divide each frequency 
-by respective value of the variable. 


Illustration 37. From the following data compute the value of harmonic mean : 


Marks 10 20 25 40 50 
No. of students 20 30 50 15 5 


Solution : CALCULATION OF HARMONIC MEAN. 


Marks 7 
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Calculation of Harmonic Mean - Continuous Series 


For calculating harmonic mean in continuous series the procedure 
is the same as applied to discrete series. The only difference is that here 
we take the reciprocal of the mid-points. 


Illustration 38. From the following data compute the value of harmonic mean : 


Class interval 10—20 20—30 30—40 40—50 50—60 
Frequency 4 6 10 7 3 
Solrtion : CAECULATION OF HARMONIC MEAN 
"€ SSA 
Class interval Frequency Mid-points i 
>? т 

10—20 4 15 0:267 

20—30 6 25 02-0 

30—40 10 35 0:286 

40—50 7 45 0:156 

50—60 3 55 0:055 

N=30 x4 )- 004 
m 


Special Uses of Harmonic Mean 


The harmonic mean is restricted in its field of usefulness. It is useful ` 
for computing the average rate of increase of profits of a concern or 
average speed at which a journey has been performed or the average 
price at which an article has been sold. The rate usually indicates the 
relation between two different types of measuring units that can be 
expressed reciprocally. For example, if a man walked 20 km. in 5 hours, 
the rate of his walking speed can be expressed 


20 km. 


LS Hon 6 km. per hour 


where the unit of the first term is a km, and the unit of the second term 
is an hour. Or, reciprocally, 


жылуы; rd 
DUE. d. поиг$ per km. 


where the unit of the first term is an hour and the unit of the second term 
is km. 

Illustration 38 (а). An automobile driver travels from plain to a hill station 
100 km. distance at an average speed of 30 km. per hour. He then makes the return 
trip at average speed of 20 km. per hour. What is his average speed over the entire 
distance (200 km.) ? 


Solution. (а) lf the prcblem is given to a layman he is most likely to compute 
the arithmetic mean of two speeds, i.e., К 


Жы Dim 70 km. 95 km.p.h. 


E-7:54 MEASURES OF CENTRAL VALUE 


But this is not the correct average. Harmonic mean would be more suitable 
Harmonic mean, of 30 and 20 is 


HMM. 2 2 2x120 


"TET 977 4077—20 km.p.h. 
20*30 120 


It can be proved that harmonic mean is the correct average in this case by 
tabulating the time and distance for each trip separately as follows : 


Distance Average speed Time taken 
(km.) Km.p.h. 
Going А 100 30 3 hours 20 minutes 
Returning 100 20 5 hours 
Total 200 8 hours 20 minutes 


Thus the total time required for covering a distance of 200 kms is 8 hours 
20 minutes which gives us an average speed of 24 km.p.h. and not 25 km.p.h. 


The above problem can be changed in such a manner that arithmetic mean 
shall give the correct answer. Suppose the driver makes the same trip but jt is given 
that he travels at 30 km. per hour for half of the time and at 20 kms. per hour for other 
half of the time. Now the correct answer about the average speed would be given by 
the arithmetic mean, 7.e., average speed=30+20/2=25 km.p.h. To verify the result 
we again prepare a table of time and distance at cach speed : 


Speed km.p.h. Distance Time required 
30 120 km. 120/304 hours 
20 80 Кт, 80/20—4 hours 
Total 200 km. 8 hours 


Thus he has covered 200 km. in 8 hours. Hence the average speed is 25 km. 
per hour. 


The above example clearly shows that when distances are the same for the two 
speeds, harmonic mean gives the correct answer but when times are same, the arithmetic 
mean of the rates of speed gives the correct answer. 


(b) An aeroplane flies along the four sides of a Square at speeds of 1,000, 2,000, 
3,000 and 4,000 km. per hour respectively. What is the average speed of the plane in 
its flight around the square ? 


(b) If we compute the arithmetic mean we get the following answer : 
X- 1,0004-2,000-3.000--4,000 10,000 
——— M 990 „10,000 

4 4 


=2,500 km. per hour 


However, that is not the Correct answer, In such problems harmonic mean is an 
appropriate average. 


4 
HM.=——___*_ ____ 


T 1 1 
1000 2000 +9050 +2005 
=й es 414 12.000 
12+6+4+3 — 25 25 


г 


12,000 12,000 


Verification, Suppose опе side of a Square is 1,000 kms, Each side, £e. 
1,000 km. he covers at average speeds of 1,000, 2,000, 3,000, 4,000 kms. respectively. 
From this we can calculate the time taken in Covering the entire distance, 


71,920 km. per hour. 


v 
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Distance Speed Time taken URS 
(m. p.h.) 
1,000 1,000 60 minutes 
1,000 2,000 30 minutes 
1,000 3,000 20 minutes 
1,000 4,000 15 minutes 
125 minutes 
Е In 125 minutes he covers 4.000 km. 
In 60 minutes he would cover 4000 x 60=1,920 кт. 


Thus the average speed over the entire distance is 1,920 and not 2,500 km. per 
hour. 


Weighted Harmonic Mean 


Е ( ao) xv, (Lus) 


lilustration 39. A Cyclist covers his first five km. at an average speed of 
10 km.p.h., another 3 km. at 8 km.p.h. and the last two km. at 5 km.p.h. Find his 
average speed for the entire journey and verify your answer, 

Solution. Weighted harmonic mean would be appropriate here. 


zw 


Н.М„= 7 1 1 
Gimp (m) 
The various speeds are 10 km.p.h , 8 km.p.h. and 5 km.p.h. Hence a, b and c respectiye- 


ly, are 10, 8and 5. The distances Covered are respectively, 5, 3 and 2 km. Hence w, 
W, and ws are 5, 3!апа 2. 


10 
x AWO T TAS 
© x5) (G- x3 )4(4 x2) 
10 


sm aay ole E SIT =7°84 km.p.h. 
2*8 t5 45 
Thus the average speed for'the entire journey is 7:84 km.p.h. 
Verification 
Speed Distance Time taken 
(ph) (km.) (in minutes) 
5 300 
3 3 2s 
3 2 24:0 
pr cT MOM LAM ге RI" PRECES УЧ > pee 
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In 765 minutes he covers 10 km. 


» 1 minute 3i der km. * 
А 10 Я 
» 60 minutes 5 765 х 60-784 km. 


Merits and Limitations of Harmonic Mean 
Merits. |. Its value is based on every item of the series. 
2. It tends itself to algebraic manipulation. 


3. In problems relating to time and rates it gives better results than 
other averages. 


Limitations. 1. It is not easily understood. 
2. It is difficult to compute. 


3. It gives largest weight to smallest items, This is generally not 
а desirable feature and as such this average is not very useful for the 
analysis of economic data. 


4. Its value cannot be computed when there are both positive and 
negative items in a series or when one or more items are zero. 


Because of these limitations, the harmonic mean has very little 
Practica! application and is not a good representation of a statistical series, 
unless the phenomenon is such where small items need to be given a very 
high weightage. 


Relationship among the Averages 


In any distribution when the original items differ in size, the value 


Ma G.M. and H.M. would also differ and will be in the following 
-order : 


A.M.» G.M.» H.M. 


i.e., arithmetic mean is greater than geometric mean and grometric mean 
is greater than harmonic mean. The «quality signs hold only if all the 
numbers Ху, X,,......... X, are identical. 


Proof. Prove that if a and b are two positive numbers their AM» GM >H.M. 
iw (M.A. Econ., Lucknow, 1969) 


Solution. Let a and b be two positive quantities such that a b 
Then A.M , and H.M. of these two quantities are 


vy a+b, =4/a 77. e. = 2ab 
=z G.M.=V 4X6 ; do im CIL 
a 6 


We have to prove that A.M.>G.M.>HM. Let us first prove that A.M.> 
G.M. or "3 ax 


а /axb 
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atb»2Vab. 
a--b—2N/ ab >0 [Since a--b—2V/ab- (V/a— y by*] 
(Мау) »0 
But the the square of any real quantity is always positive. Hence (y/a— уБ)? wilt 
be positive. Hence ae > 2y/ab 
Let us now prove that G.M.» H.M. 
ус. 2ab 
or VDR Б 
This has already been proved above, Hence G.M.» Н.М. 


Since we have shown that 4.M.» G.M. and G.M.»H.M, it is automatically 
Proved that A.M.» G.M.» Н.М. 


If a and b are equal in that case 4.M.—G.M.—H.M. Thus A.M.» G.M.» H.M- 


or 1» ab or a+b>2Vab 
a+b 


An Application of the Concept of Average 


We have discussed above five different averages. Of these the first: 
three, namely, mean, median and mode, are in common use— the first. 
named "being by far the most popular in statistical work. The other 
averages, like geometric mean and harmonic mean, rarely used, are 
ofsignificance in special cases. Besides these, two other averages are 
discussed below, namely, 


l. Moving Average, and 
2. Progressive Average. 


. These averages are useful in the analysis of time series, i.e., chrono- 
logical data. 


Moving Average* 


This average is quite popular in the analysis of time’series. Ву; 
smoothing out the fluctuations in time series it gives us an idea about the: 
general long-term tendency of the data. є 


A moving average is a series of overlapping averages that serve: to- 
approximate the trend of a series by cancelling out the high and low 
values. Each average is an ordinary arithmetic mean of several values. 
written against the middle-year. Thus moving average is calculated by 
using the technique of simple arithmetic mean. 


Progressive Average 


This average is also based upon the arithmetic mean. It is different 
from the moving average in two respects : 


(i) It is а cumulative average. In the calculation of this average: 
all previous figures are added and no previous figure is left as is done in 
the case of moving average. The progressive average for the first year 


*For a detailed discussion of this average, please refer to Chapter on. ‘Time 


Series Analysis’. 
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would remain the same ; the progressive average for the second year is 


equal wo for the third year ake 3 for the fourth year 


a+b+e-+d 
4 


; and so on. 


(ii) The average figures can be obtained for all the years for which 
the data are given. The moving average, on the other hand, cannot be 
computed for ail the years. The longer the period of moving average the 
greater the nuinber oi years for which the moving average cannot be 
'computed. 


This average is generally used during the caily years of the working 
of а business. For c sample, the figures of sales, profits or production of 
each successive year may be compared with the respective figures for the 
entire previous period in order to find out how a business is growing. 


The process of computing the average shall be clear with the help of 
the following « xample. 


Ilustraticn 40. Calculate Progressive average from the following data : 


Year 1960 1961 1962 1963 1964 1965 1966 
Sales of finished 
Steel (m. tonnes) 12 14 15 18 25 22 30 


Solution : COMPUTATION OF PROGRESSIVE AVERAGE 
эш eee 


Year Sales Progressive Шет Progressive 
Gn. tonnes) total included Gverage 
1960 үг 12 1 12-00 
1961 14 26 2 13 00 
1962 15 4l 3 13:70 
1963 18 . 59 4 14:75 
1904 25 84 5 16:80 
1965 22 106 6 17-67 
1906 30 136 7 19:43 


The progressive average makes it clear that the above company is steadily 
progressing year after year, 


MISCELLANEOUS ILLUSTRATIONS 


Tllustration 41, The following figures show the number of passengers carried on 
each of 50 JOurneys by an aircraft with a seating capacity of 100, Calculate : 


(a) the average Capacity used, and (5) if 65 passengers is the smallest profitable 
load, the proportion of flizhis which were unprofitable, 


10 18 61 65 72 TRIUT ES 75 45 66 68 95 92 
31 4l 36 — 33 T8 65 0075 74 sw & 68 32 46 
Ve MEC RS КАЗ CES 69,12. „84 ..89. 902. 75 84 
68 72 45 45 39 [ү 3 38 6 69 72 
-у EX _ 2928 


Solution : 
(а) aS o 58:56 


Thus the average capacity used іѕ 58:56 Passengers, 
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(5) The number of flights that carried passengers below 65 is 23. Hence the 
proportion of flights which were unprofitable would be 23/50 x 100=46 per cent. 


r puse сүрт, 42. The following are the monthly salaries їп rupees of 30 employees. 
ofa : 


91 139 1326 . 19. Ө 87 965 77 99 95 
108 127 86, 148 16 7 69 88 12 118 
89 16 97 105 95 80 86 106 93 135 


А The firm gave bonuses of Rs. 10, 15, 20, 25, 30, 35, 40, 45 апа 50 to employees. 
in the respective salary groups: exceeding 60 but not exceeding 70, exceeding 70 but 
notexceeding 80, and so on up to exceeding 140 but not exceeding 150. Find the 


average bonus paid per employee. (B. Com., Delhi, 1969) 
Solution : CLASSIFICATION OF GIVEN DATA 
3. Monthly Salaries Tally Bars Frequency 
Rs. 
61— 70 " 2 
71— 80 ut 3 
81— 90 [d 5 
91—100 Heg y. 
101—110 m 3 
111—120 ut 5 
121—130 u 2 
131—140 n 2 
141—150 1 
N=30 


AVERAGE BONUS PAID 


Bonus No. of Employees JX 
xX f, 
10 2 20 
15 3 45 
20 9, 100 
25 7 175 
30 3. 90 
35 5 175 
40 2 80 
45 2 90 
50 1 50 

N=30 XfX—825 
g rX 85 
SEN ЗО =27°5. 


Thus the average bonus paid per employee=Rs, 27:5. 


Illustration 43. In 500 small-scale industrial units the return on investment. 
ranged from 0 to 30 per cent, no unit sustaining any loss. 5 per cent of the units had 
returns ranging from 0 per cent up to (and including) 5 per cent and 15 per cent of the 
units earned returns exceeding 5 per cent but not exceeding 10 per cent, The median 
rate of return was 15 per cent, and the upper quartile 20 percent. The uppermost 
layer of the returns exce2ding 25 per cent was earned by 50 units. 

Present this information in the form of a frequency table with intervals of 5. per 
cent, as follows : 
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‘Bxceeding 0 per cent but not exceeding 5 per cent 


^" SN ON TT ERN, » 10 , 
^ 10... зч dint ioe 99) AS op 
” ДЭ, Ie My Е » 20 , 
^" ЕТЕ и » 25 ,, 
SUN CARES эе; » 30 oy 


Use i ED ҮЕ AN as the ranks of the lower, middle and upper quartiles 
respectively. Find the rate of return round which there is maximum concentration of 
the units, `, (C.A., 1970) 

Solution. The information given above can be summarized as follows : 

TABLE SHOWING THE DISTRIBUTION OF SMALL-SCALE 


INDUSTRIAL UNITS ACCORDING TO THE RATE OF 
RETURN ON INVESTMENT 


Rate on Return of Investment Firms % of Number of 
% Total Firms 
Sar mda ders: NE NND 
Excceding 0 but not exceeding 5 5 (a) 25 
” $ 4» »  ]0 15 (5 75 
ЭО |, eb 51 M (c) 30 (d) 150 
" POSU » 20 (e) 25 (f) 125 
po 20 uon » 45 (g) 15 (А) 75 


SOMOS E Re 10 50 
| 100 500 


On the basis of information Provided the computations are shown below : 
(а) 5% of 500=20, 
(b) 15% of 500—75. 


(c) The rate of return of 1575, being the median ( i) would represent 50% of 


‘the firms, 20% of the data is comprised in previous classes. Thus, this class woula 
‘represent 30% of the firms, 


(d) In the consequence 30% of 500—150. 


(e) Firms having 20% return constitute the upper quartile ES This shall 


‘cover 75% of the data comprised in the preceding class being representation of 50% 
‘firms, this class will cover 25% of data. ‘ 


(f) In consequence 2595 of 500—125. 
(g) The residual balance of given data equals 15%. 
(№) 15% being the residual balance, it represents 75 firms. 


The rate%of return around which there is maximum concentration is the modal 
class. The mode lies in the class 10—15. 


» Ai ; 
Mode=L+ TE xi 
L=10, A,=150—75, Ay=150~125=25, i=5. 
pi Mo=10+ Boas 10-3 7513-75 


Hence the rate of return around which there is maximum concentration of units 
is 13-75%. 

Illustration 44. (a) The mean annual salary paid to all employees of a company 
was Rs. 5,000. The mean annual salaries paid to male and female employees were 
Rs. 5,200 and Rs. 4,200 respectively. Determine the percentage of males and females 
employed by the company. (В.А. Hons., Econ. Delhi, 1973) 


` 
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Solution. (а) Let №, represent percentage of males and N, percentage of females 
so that N14-N,—100. 
We are given Х13— 5,000; X,—5,200, X,—4,200. 
Substituting the values in the formula: . 
y. М+М, 
u- MEM Є 
5,009 (5200) t миа 200) 


5,00,000—5,200N; + (( 100—N1)(4,200)) : Since № += 1 
М»=(100- №) 
ог 5,00,000 —5,200 №, 4-4,20,000. —4,200N 
1,000 N;=80,000 
N,=80 апа N,—100— N, —(109 —80)=20, 


of this error, a А BAN 
(b) Correct Mean = 2х 
ZX-NX 
EN=75, X —27 
na УХ =75x27=2,025 
Correct EX=2,025—43 +53=2,035 
Correct 1, = 2035 yas. 


The values of median and mode will not be affected by this error because these 
are positional averages and the median value is 29 which is far away from the values 
43and53. Similarly the value of mode would not E aifected by the error since mode 


Illustration 45. Draw a Histogram from the following data and measure the 
modal value : [ 


Size class Frequency Size class Frequency 

0—10 5 50— 60 10 
10—20 п 60— 70 8 
20—30 19 70— 80 6 
30—40 at 80— 90 3 
40—50 16 90—100 


1 
i (8. Com., Delhi, 1973) 
Solution : 


HISTOGRAM 


E-7:62 MEASURES OF CENTRAL VALUK 
Illustration 46, Calculate median and mode from the data given below : 


Wages No. of workers Wages No. of workers 
(Rs.) (Rs.) 
Above 30 520 Above 70 105 
» 40 470 e 80 45 
ar uae 399 » £90 7 
» 60 210 


(B. Com., Andhra, 1970) 
Solution: CALCULATION OF MEDIAN AND MODE 


Wages No. of workers cy Weges No. of workers cf. 
(Rs.) (Rs.) 

30—40 50 50 70—80 60 475 

40—50 71 121 80—90 38 513 

50—60 189 310 90 -100 7 520 

60—70 105 415 


520 


Med, —Size of Sth item=Size of Ty = 260th item 


Median lies in the class 50—60. 
> cf. 
Med.-L4- 7 хї 
L=50, 70—260, c. £121, f=189, 1—10 
260—121 1,390 
Med.—504- M x 10-50--—155— 25735 
Mo=L+ ы, хї 


By inspection mode lies in the closs 50—60, 
L=50, ^1—(189—71)—118 ; 437(189—105)— 84, i=10 


118 LD 
Мо=50+ X10—50-.,7-—5584 


Tilustration 47. Find the class Agen. if the arithmetic mean of the following 
distribution is 33 and assumed mean 3: 


Step deviation —3 —2 —1 0 +1 +2 
Frequency 5 10 25 30 20 10 
Solution : DETERMINATION OF CLASS INTERVALS 
Step deviation Frequency | 
а f fa’ 
ee Pe Ee 7 - i 
-3 5 —15 
—2 10 —20 
—1 25 —25 
0 30 0 
+1 20 +20 
+2 10 +20 
m ес We Азы ЛУ. дё TI Be SR ЖЗ (ы tr. c 
N=100 Efd'—-—20 
—LuL arf Ug aM S l 
Ifa’ 
Smau Ad 


A235, X—33, N=100, fd'— —20 
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Substituting the values 33=35— 2054 
33—35— —02i 
0:2i-2 
i=- 10. Thus class interval=10 
Assumed mean lies in the mid-value of that class with *O' as step deviation. The Lower 
and upper limits of this class are : 


35- 2 —30 and 3540 =40, i.e., 30—40. The other classes will be: 


0—10, 10—20, 20—30, 30—40, 40—50, 50—60. 
Note: Since all the step deviations show equal gap, c=i in the formula for 


calculating arithmetic mean. As we are required to determine class interval, we have 
substituted i in place of c. 


Illustration 48. Calculate mode from the following series : 


Marks Мо. of Students Marks No. of Students 

0-2 8 25—30 45 

2-4 12 30—40 60 

4—10 20 40—50 20 , 
10—15 10 50—60 13 

15—20 16 60—80 15 
20—25 25 80—100 4 


Solution. Before calculating mode we will have to adjust the class intervals so 
as to make them equal. The adjusted classes would be as follows : 


Marks No. of Students 
0—20 8+12+20+10+16= 66 
20—40 25--45--60—130 
40—60 204-13—33 
60—80 15 
80—100 4 
Mode lies in the class 20—40. 
HY TORUM ШУ 
Mode=L+ ArkAs xi 
L=20, 4,—130—66-—64, ^,—130—33—97, i=20 
64 i 4 
Mode=20+ c ror х20=20+7:95=27:95 


Illustration 49. The median and mode of the following wage distribution аге 
known to be Rs., 33:5 and Rs. 34 respectively. Three frequency values from the table 
are, however, missing. Find these missing values. 


Wages in Rs. Frequencies 

0—10 4 
10—20 16 
20—30 [51 
30—40 ү! 
40—50 ? 
50—60 6 
60—70 4 
230 


(T.D.C. 2nd yr. Raj. 1975) 
SME—10'77-14 
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Solution. Let the missing frequencies be : 


20—30 х 
30—40 y 
40—50 230—30%—x—y 
Since median and mode are 33:5 and 34 they both lie in the class 30— 40. 
N 
fd e 


Med.=L+ F xi 


Med,=33'5, L=30, № — 115, c.f.=20+x, f=y, 1—10 


355730, 12—092), 19394 ABR 20—* x10 
304-5 —* x10 


y 
33:5y—30y4-950— 10x 
+ 335у =950—10х 
3:5y-95—x 
or 35y--x-95 
А. 
Mode=L+ Stan xi 
Mode=34, L=30 ; A1=(y—x) ; As=[y—(230-30-x—y)} ; i10 


-x -x 
3-304 р or 10201-0739 x 10 
a 10y— 10x 
3y—200 
12» —800—10y— 10x 
or 2y+10x-=800 
Multiplying Eqn. (i) by 10 


3'5y--10x «950 
2y-4- 10x —800 


VSy=150 — 

y=100 
Substituting the value of y in Eqn. (i), 

0:35x1004-x—95 

354-x —95 
a x=60 
Thos the missing frequencies are : 
le Class intervals Frequencies 

20—30 x=60 
30—40 y=100 
40—50 200—860 —100- 40 


E 


EX) 


Illustration 50. The following is the age distribution of 2,000 persons working 


in a large textile mill : 


* 30 is the sum of known frequencies, i.e., 4--164-6--4—30. 
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Age Group No. of persons Age Group No. of persons 
15 but less than 20 80 45 but less than 50 268 
20 5 5-2. 250 30. 4 5 5. MM 150 
25 » a 5 30 300 55 E m P 60 75 
30 Sn sy Ula ESS 325 er Ist Уу. -] 25 
$5. civ К АА 287 ЖИЛЕ ae ТЫЛЫШ 20 
40 » B, e 45 220 


Because of the heavy losses the management decides to bring down the strength to 40% 
of the present number according to the following scheme < 


(i) To retrench the first 10% from the lower group. 
(ii) To absorb the next 49% in other branches. : 
(iii) То make 10% from th: highest age group retire premature. 
What will be the арг limits of ithe perions retained lin the mill and of those 
transferred to other branches? Also calculate the average age of those retained. 


Solution. The number of persons to be retrenched from the lower group 


2 
= 150510 _200. Eighty of these will bs from 15—23 age group and the rest 


(200—83) «120 from 2)—25 age group. 


The persons to bz absorbed in other branches = 2.000349. o, . They belong to 
the following age groups: 
Age Group No, of persons 
20—25 (250—120) 130 
25—30 300 
30—35 325 
35—40 (287.—242) 45 
800 
Those who are to retire are 200x 10200 in all and they belong to highest age group. 
Their age groups are : 
Age group 3 Мо. of persons 
65—70 20 
60—65 25 
55—60 75 
50—55 (150—70) 80 
200 
Hence the age limits of those who are retained in the mill are : 
б Number 
Age Group of 
persons 
35—40 242 
40—45 220 
45—50 268 
50—55 70 
Total 300* 


* This is 40% of the total, i.e., 2,000. 


| 
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„CALCULATION OF AVERAGE AGE OF THOSE RETAINED | 


—47. 
Age group f m.p. nm 
а! fd’ | 
35—40 242 37:5 - —484 
40—45 220 42:5 =í —220 
45—50 268 475 0 0 
50—55 70 52:5 +1 + 70 | 
Жа а eee a тд, NNNM ee 70 
. 0 N-80 Xü--64 | 
А | 
AT M. xc | 
| 
А= 4755, Yfd' = —634, N=800, С 5 
Х.-475- ы X5—41:5—3:96—43:54 years. 


Illustration 51. The mean wage of 1С0 Jabourers working in a factory running 
two shifts of 60 and 40 workers respectively is Rs, 38. The mean wage of 60 labourers 
Working in the morning shift is Rs. 40. Find the mean wage of 40 labourers wo rking 


in the evening shift. (B. Com. Delhi, 1972) . 
= NX, + NX, 
Solution. We know that Xa NEN 


4-38, М, —60, N,—40, X, 40 
3g (60x40) x 40. 
60-440 


38x 100=2,400+40.X, 
40X,—3,800—2,400—1,400 

e » TT 

Thus the mean wage of 40 workers working ia the evening shift is Rs. 35. 


Mlustration 52. The following marks have been obtained by 11 students in 3 papers 
of accountancy. In which paper is the general level of knowledge of the student: 
est ?- Give reasons : 
4 add gus ccm ТОТ... еа ео ео У 37, 38, 40 
B SL дё 1128, 62%, 645 222-1322. ой. 42 
С 62, 33,840; 2, 87 ОВ У 7446. 0833, 7537.51 


Solution. If we Work out the arithmetic mean, we get more Or less the sam 
answer. However, median is better in such cases. 


CALCULATION OF MEDIAN 


Marks arranged in ascending order Marks arranged in ascending order 
4 B с А В с 
28 22 08 42 42 51 
30 28 29 55 45 53 
35 32 33 62 62 62 
37 36 33 63 62 62 
38 39 40 72 90 87 
40 41 _ 4 
oo NI 


Median Size of Tth item=Size of EL — 6th item, , 
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The median is 40, 41, 44 for papers А, B and C respectively. Hence the general 
level of knowledge of students is highest in paper C. 


Illustration 53. A machine was.purchased for Rs. 5 -lakhs іп 1970. Depre- 
ciation on the diminishing balance was charged 40% in the first year, 25 percent in the 
second year and 107; p.a. during the next three years. What is the average rate of 
depreciation during the whole period ? 

Solution. The cost of the machine will not affect the calculation of the rate of 
depreciation and hence it can be ignored. “Average rate of depreciation would be 
obtained by applying geometric mean : 


Year Diminishing value taking 100 as base Log X 
X 

1970 100—40=60 t7782 
1971 100—25=75 T8751 
1972 100—10=90 19542 
1973 100—10—90 1:9542 
1974 100 — 10-90 1:9542 

F log X—9:5159 

G.M.-A.L. 218% —. A p, 93159 LA p, 1903289, 


Since the diminishing value is Rs, 80, the depreciation will be 100 —80—2077. 


Thus the average rate of depreciation charged during the whole period is 20 
per cent, ? 


\ 
Tilustration 54. Find the numbers whose arithmetic mean is 12:5 and geometric 
mean 10 ^ 


Solution. Let the numbers be ‘a’ and ‘b’ : | 
G.M.=1/2xb=10 -. ab (10)2—100 
92125 
a+b=2x125=25 
We know that (a--5)3 —(a— b)*—4ab 
Substituting the given values, we have 
(25)? - (a—bj3—4x 100 
625— (a — b: —400 


or (a—by2-625—400—225 
a—b—N/225—15 x 
a+b=25 a ЕЕС) 
а-Ь-=15 ...(й) 
PT s^. 2а=40 or a=20 
Substituting the value of ‘a’ in (i) - р 
b=25—20=5 


Hence the two numbers are 20 and 5. 


Tilust: 55. 10 per cent of the workers in a fitm employing a total of 1,000 
workers pedis rien Rs, Р per day, 200 earn between Rs. 5—9 99, 30 per cent. between 
10—14:99, 250 workers between 15—19:99 and the rest 20 and above. What is the 
median wage ? 7 


E 
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Solution, First cenvert the given figures in a frequency distribution as follows : 


Wages No. of workers cf 
(Rs.) f 
Below 5 1,000 x 10/100— 100 100 
5—9:99 200 300 
10—14:99 1,000/30 x 100—300 ( 850 600 
15 —19:99 250 850 
20 and above 150 1,000 


Елнар аала T Т 0o 
Median Size of th item- Size of 1000 =500th item 


А gag edian lies in the class 15—19°99. But the real limit of this class is 14'995 


N 
HRSA 
Med.=L+ (7mm xi 
L-—14:955, М 500, ¢.f=300, f=250, i=5 


Med, = 14:9954- Sa X5=14°99544—=Rs, 18:995, 


THustration 56. In a certain examination, the average grade of all students in 
class 4 is 68'4 and all students in class Bis 71-2. 1f the average of both classes com- 
be reg is 70, find the ratio ef the number of students in class 4 to the number in 

ass B, 


(B. Com., Bombay, 1973) 


Solution Let us assume that the number of students in class 4 was *Y' and in 
class B it was ‘Y’, 


We are given X,,=73, X,—684, X,—71:2 
Substituting these values in the formula : 


-OMÀGMÀ. 2) 684X4712Y 
Tuc MEME. mostrar 


Х+Ү 
7T0(X--Y)—684X--712Y 
70X—684X4-70Y712Y—0 

Ll6X—12Y-0 
L6X—12Y 
Suppose X—10 
16 40 
F2Y-16 ог 1-32. 
ӯ Thus X and Y are in the ratic of 


10:0. ог 30:40. 


Hence for every 3 students in class A, there are 4 students in class B, 


^ Age We can assume Xx to be anything but the ratio would come out to be 


JHustration 57. The arithmetic mean height of 50 students of a college is 5'—8". 
The height of 30 of these is given in the frequency distribution below. Find the arith- 
metic mean height of the remaining 20 students : 
Height in inches S'—4'  s'—6 58"  s—q 6—0” 
Freauency 4 12 4 8 2 
(B.A. Hons, Econ., Delhi, 1975) 


—— ЦА 
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Solution: | CALCULATION OF ARITHMETIC MEAN 


Height (inches) (X—68)/2 Frequency 
x а f 


fa’ 

64 27 4 ar 
66 = 12 Ex 
68 0 4 0 
70 1 8 8 
72 2 2 4 
N=30 Sfd'=—8 

Zfd 8 16 2,024 


X-A- WN-XC-68- эр х2=68— 30 = 30 
Mean height of 50 students=5’—8” 
Total of the height of 50 students 68 x 50=3,400 


Total of the height of 30 students 2024 x 30—2,024 


Difference —3,400 —2,024— 1,376 


Mean of 20 students = 

‚‚  lllustration 58. From the following data of income distribution, calculate the 

arithmetic mean. It is given that (i) the total income of persons іп the highest income 
group is Rs. 435, and (ii) none is earning less than Rs. 20. 


Income (Rs.) No. of persons Income (Rs.) No. of persons 
Below 30 16 Below 70 87 
» 40 36 y» 80 95 
» 0 61 80 and above 5 
» 60 76 


(B. Com., Kurukshetra, 1975) 
Solution : CALCULATION OF ARITHMETIC MEAN] 


Income m.p fm 
(Rs.) m d 
2, ЯР, un arsi one ВИП ЕТТЕ НЕНИН ee 

20—30 25 16 400 

30—40 35 36 1,260 ` 

50—00 5 % 4180 

4 5: 7 » 

60—70 65 87 5,655 

70— 80 75 95 7,125 

80 and above — 5 435* 
N=376 Zfm=21,800 

ХЗ" _ AO 5798 


Thus the average income is 57 rupees 98 pa se., 


Which Average to use ? 

We have explained above the methods of computing the various 
types of averages and also their distinctive features. At this point the 
reader has a right to ask “which of these averages should I use" ? or 


* We are given total income in the highest income group as Rs. 435. This is in 
fact fm for the class 70 and above. 
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“When ought I to use one or the other of the averages. descibed" ? “ог 
Which of these is the best average to be used”. 


It must be clearly understood that no one average can be regarded 
as best for all circumstances. The following considerations influence the 
selection of an appropriate average : 

l. The purpose which the average is designed to serve. 

2. Would the average be used for further computations ? 

3. The type of data available. Are they badly skewed (avoid the 
mean), gappy around the middle (avoid the median), or unequal in class 
interval (avoid the mode) ? 

4. The concept of the typical value required by the problem. 
Within the framework of descriptive statistics the main requirement is to 
know what each average means and then select one that fulfils the purpose 
in hand. Is a composite average of all absolute or relative values needed 
(arithmetic mean or geometric mean), or is middle value wanted (median), 
Ог the most common value (mode) ? 


On occasion, it may even be advisable to work out more than one 
average and present them, although, to be sure, this procedure creates an 
added burden for the reader as well as for the statistician. Butt he added 
burden is preferable to the use of single average that may be an incomplete 
description. To use it alone is like looking through a keyhole, the part of 
the room you can see cannot give a full idea of the whole room. 


Median. The median is generally the best average in open-end 
grouped distributions, especially where if plotted as a frequency curve one 
gets a J or reverse J curve ; for example, in case of price distribution or 
income distribution. In such cases very high or very low values would 
causé the mean to be higher or lower than the most common" values. In 
such instances, the median or middle value of the series may be a more 
representative figure to use in describing the mass of data. 


-a » Mode. Generally speaking, the principal value of mode lies in the 

~ fact, that it can be used to describe qualitative data. The mode can be 
used in problems involving the expression of preferences where quantitative 
measurements are not possible. Thus the preferred type of package design 
among à number of alternative designs would be the modal design. If we 
Want to compare consumer preferences for different kinds of products, or 
different kinds of advertising, we can compare the modal preferences 
expressed by different groups of people but we cannot calculate the median 
ormean.* Mode is a particularly useful average for discrete series, e.g., 
number of people wearing a given size of shoe, or number of children per 
household, etc. Тһе mode is best suited where there is an outstandingly 
large frequency. . 


Geometric Mean. Geometric mean is useful in averaging ratios 
arid percentages and in computing average rates of increase or decrease. 
It is particularly important in Economics and Business Statistics in index 
number construction. 1 


PESE. odi: ES 
* Freund and Williams-: Modern Business Statistics. 


Ve 
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Harmonic Mean. Harmonic mean is useful in problems in which 
values of a variable are compared with a constant quantity of another 
variable, i.¢., rates, time, distance covered within certain time and 
quantities purchased or sold per unit, etc. 


Arithmetic Mean 


In the following cases arithmetic mean should not be used : 
l. In highly skewed distributions. 
2. In distribution with open-end intervals. 


3. When the distribution is unevenly spread, concentration being 
small or large at irregular points. 


4. The arithmetic mean should not be used to average ratios and 
rates of change. In such cases the geometric mean is more suitable. 


5. When there are very large and very small items, arithmetic mean 
would be seriously misleading on account of undue influence from extreme 
items. 


Leaving aside the above specific cases where either median, mode, 
geometric mean or harmonic mean is more appropriate, in other cases we 
should apply as a rule of thumb the arithmetic mean—the most popular 
and widely used average in practice. 


It may also be pointed out that a complete description of a distribu- 
tion occasionally calls for two or more of these averages. It is true that 
presenting two or more averages creates an added burden for the 
investigator as well as for the consumer of statistics, However, the work 
this extra burden entails is fully justified if it presents a more complete 
description of the data that is possible from a single measure. 


GENERAL LIMITATIONS OF AN AVERAGE 


1. Since an average is a single value representing a group of values, 
it must be properly interpreted ; otherwise, there is every possibility of 
jumping to wrong conclusions. This can be best illustrated with the help 
ofa story. A person had to cross the river from one bank to another. He 
was not aware of the depth of the river, so he enquired from another man 
who told him that the average depth of water is 5'4". The man was 
5' 6'" and he thought that he can very easily cross the river because at all 
time he would be above the level of water. So he started. In the beginning 
the level of water was very low but as he reached the middle, the water 
was 15 ft. deep and he lost his life. The man was drowned because he 
had a misconception that average depth means uniform depth throughout. 
But it is not so. - An average represents a group of values and lies some- 
"where in between the two extremes, i.e., the largest and the smallest items 
of the series, ` я 

2. An average may give usa value that does not exist ч {ү data. 
For example, the arithmetic mean of 100, 300, 250, 50, 1С0 is 78.7100, a 


value that does not exist in the data. 
3. Аі times the average тлу give a very absurd result. For example 
if we are calculating average size of a family we may get a value 4°8. But 


" 
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this is impossible as persons cannot be in fractions. However, we should 
remember that it is an average value representing the entire group. 

4. Measures of central value fail to give any idea about the forma- 
tion of the series. Two or more series may have the same central value 
but may differ widely in composition. For example, observe the following 
two series ; 


Series A Series В 
— ë iO 
150 300 
170 500 
190 20 
210 78 
180 2 
renee ea ИШ EA „йк aa E 
Total 900 900 
Bey a a a Бл a a i 
Xx 180 180 


5. We must remember that an average is a measure of central 
tendency. Hence, unless the data show a clear single concentration of 
observations, an average may not be meaningful at all. This evidently 
precludes the use of any average to typify a bimodal a U-shaped or a 
J shaped distribution. 


SUGGESTED READINGS 
Chou : Statistical Analysis, Ch. 3 
Croxton & Cowden : Applied General Statistics, Ch. 9. 
Neiswanger : Elementary Statistical Methods, Ch. 9. 
Wallis and Roberts : Statistics—A New Approach, Ch. 7 


Wessel and Willett 3 ydo as Applied to Economics and Business, 


8 | Measures of Variation 


The various measures of central value discussed in the previous 
chapter give us one single figure that represents the entire data. But 
the averages alone cannot adequately describe а set of observations, 
unless all the observations are the same. It is necessary to describe 
the variability or dispersion of the observations. Also in two or more: 
distributions the central value may be the same but still there can be: 
wide disparities in the formation of the distribuiton, Measures of 
variation* help us in studying this important characteristic of a dis- 
tribution, 4e, the extent to which the items vary from one another and: 
from some central value. The significance of the measure of variation can 
best be appreciated from the following examples : 


— JP 


Series A Series B Series C 

100 100 1 

100 105 489 

100 102 2 

100 103 3 

100 * 90 5 

Total 500 500 500 
X 100 100 100 


Since arithmetic mean is the same in all the three series one is likely 
to conclude that these series are alike in nature. Buta close examination 
shall reveal that distributions differ widely from one another. In series A, 
each and every item is perfectly represented by the *arithemetic mean' or, 
in other words, none of the items of series А deviates from the arithmetic 
mean and hence there is no dispersion. In series B, only one item is 
perfectly represented by the arithmetic mean, the other items vary but the 
variation is very small as compared to series C. In series С, nota single 
item is represented by the arithmetic mean and the items vary widely from 
one another. In series С dispersion is much greater comp ared to series B. 
Similarly, we may have two groups of labourers with the same mean. 
salary and yet their disiributions may differ widely. The mean salary 
may not be so important a characteristic as the variation of the items from. 
the mean. To the student of social affairs, the mean income is not so 
vitally important as to know how this income is distributed Are a large 
number receiving the mean. income or are there a few with enormous 
incomes and millions with incomes far below the mean ? The three figures. 
on the next page represent frequency distributions with some of the charac: 
teristics we wish to emphasize here. 

The two curves in diagram (a) represent. two distributions with the 
same mean X, but with different dispersions. The two curves іп (5). 


* The measurement of the scatterness of the mass of figures in a series about ат. 
average is called measure of variation or dispersion —Kafka : Basic Statistics. Measures ^ 
of dispersion are also called *'averages of the second order" for the reas m that these: 
measures give an average of the differences ofthe various items from an average. 
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Tepresent two distributions with the Same dispersion but with unequal 


means, X, and Y,. Finally, (c) represents two distributions with unequal 
dispersions. 


The measures of central tendency are, therefore, insufficient. They 
must be supported and supplemented with other measures. In this chapter, 


Measures of variation are needed for four basic purposes ; 
(i) To determine the reliability of an average, 
(5) To serve as a basis for the control of the variability. m 
(ii) To compare two or more series with regard to their variability. 
(iv) To facilitate the use of other statistica] measures. 
He SS e „ЛА. n 
» * The question of the direction of the variation will be discussed in the next 
chapter, 
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A brief explanation of these points is given below : 

(i Measures of variation point out as to how faran average is. 
representative of the mass. When dispersion is small, the average is a 
typical value in the sense that it closely represents the individual value: 
and it js reliable in the sense that it is a good estimate of the average in the- 
corresponding universe. On the other hand, when dispersion is large. the: 
average is not so typical, and unless the sample is very large, the average: 
may bé, quite unreliable. 

(ii) Another purpose of measuring dispersion is to determine nature: 
and cause of variation in order to control the variation itself. In matters. 
of health, variations in bodv temperature, pulse beat and blood pressure 
are the basic guides to diagnosis. Prescribed treatment is designed to- 
control their variation. In industrial production efficient operation requires 
control of quality variation, the causes of which are sought through 
inspection and quality-control programmes. Thus measurement of 
dispersion is basic to the control of causes of variation. In engineering 
problems measures of dispersion are often specially important. In social. 
sciences a special problem requiring the measurement of variability is the- 
measurement of “inequality” of the distribution of income or wealth, etc. 

(iii) Measures of dispersion enable a comparison to be made of two or- 
more series with regard to their variability. The study of variation may: 
also be looked upon as a means of determining uniformity or consistency. 
A high degree of variation would mean little uniformity or consistency 
whereas a low degree of variation would mean great uniformity or 
consistency. 

(iv) Many powerful analytical tools in statistics such as correlation. 
study, the testing of hypothesis, the. analysis of fluctuations, techniques of' 
production control, cost control and so on are based on measures of“ 
variation of one kind or another. А 


Properties of a Good Measure of Variation 
A good measure of dispersion should possess, as far as possible, the: 
following properties* : 
(i) It should be simple to understand. 
(й) It should be easy to compute. 
(iii) It should be rigidly defined. E 
(iv) It should be based on each and every item of the distribution.. 
(v) It should be amenable to further algebraic treatment. 


(vi) It should have sampling stability. 
(vii) It should not be unduly affected by extreme items. 


Methods of Studying Variation 
The following are the important methods of studying variation + 
I. The Range. i san 
IL The Interquartile Range and the Quartile Deviation. 
HI. The Mean Deviation or Average Deviation. 
IV. The Standard Deviation, and 
V. The Lorenz Curve. 


*These properties are the same as those of a good measure of central value, For 
details, please refer to Chapter on ‘Measures of Central Value.” Я 
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Of these the first two, namely, the range and quartile deviation, are 
positional measures because they depend on the values at particular 
position in the distribution. The other two, the average deviation and 
the standard deviation are called calculated measures of deviation because 
all of the values are employed in their calculation and the last one is a 
graphic method. ; 


Absolute and Relative Measures of Variation 


Measures of dispersion may be either absolute or relative. Absolute 
measures of dispersion are expressed in the same staustical unit in which 
the original data are given such as rupees, kilograms, tonnes, etc. These 
values may be used to compare the variation in two distributions provided 
the variables are expressed in the same units and of the same average 
size. In case the two sets of data are expressed in different units, however, 
such as maunds of sugar versus tonnes of sugarcane or if the average size 
is very different such as manager's salary versus workers’ salary the 
absolute measures of dispersion are not comparable. In such cases measures 
of relative dispersion should be used. 

A measure of relative dispersion is the ratio of a measure of 
absolute dispersion to an appropriate average. It.is sometimes called a 
coefficient of dispersion, because ‘‘coefficient means a pure number that is 
independent of the unit of measurement. It should be remembered that 
while computing the relative dispersion the average used as base should be 
the same one from which the absolute deviations were measured. This 
means that the arithmetic mean should be used w th the standard devi- 
ation, and either the arithmetic mean or median with the mean deviation, 

І. RANGE 

Range is the simplest method of studying dispersion. It is defined 
аз the difference between the value of the smallest item and the value of 
the largest item included in the distribution. Symbolically, 

Range=L—.9 
where L=Largest item, and 
S=Smallest item. 

The relative measure corresponding to range, called the coefficient 
of range, is obtained by applying the following formula : 
L—S 
1+ 

If the averages of the two distributions are about the same, а 
comparison of the ranges indicates that the distribution with the smaller 
range has less dispersion, and the average of that distribution is more 
typical of the group. 

Illustration 1. The following are the prices of shares of A B Co. Ltd. from 
Monday to Saturday : 


Coefficient of Range = 


Days Price (Rs.) Days Price (Rs.) 
Monday 200 Thursday 160 
Tuesday 210 Friday 220 
Wednesday 208 Saturday 250 
Calculate range and its coefficient 
Solution. Range=L-—S 
Here L=250 and S=160 


Range=250—160=Rs. 90 


Us 25 50— 
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There ate two methods of determining the range from data grouped 
into a frequency distribution. The first method is to find the difference 
between the upper limit of the highest wage class and the lower limit of the 
lowest wage class. The other method is to subtract the mid-point of the 
lowest wage class from the mid-point of the highest wage class. In practice, 
both the methods are used. 


Illustration 2, Calculate Coefficient of Range from the following data : 


Marks No. of students Marks No. of Students 
10—20 8 40—50 8 

20—30 10 50—60 4 
30—40 12 


А 2 —10 50 
Solution. Coefficient of Range= 25 eri = 90 70714 


Merits and Limitations of Range 


Merits. Amongst all the methods of studying dispersion range is the 
simplest to understand and the easiest to compute, It takes the minimum 
time to calculate the value of range. Hence, if one is interested in getting 
a quick, rather than a very accurate picture of variability one may compute 
range. 

Limitations. (i) Range is not based on cach and every item of 
the distribution. 

(ii) It is subject to fluctuations of considerable magnitude from 
sample to sample. 

(iii) Range cannot tell us anything about the character of the dis- 
tribution within the two extreme observations. For example, observe the 
following three series : 

Series A 46, 66, 46, 46, 46, 46, 46, 46 

Series B 6, 6, 6, 6, 46, 46, 46, 46 

Series C 6, 10, 15, 25, 30, 32, 40, 46 

In all the three series range is the same, ie, (46—6)=40. But it 
does not mean that the distributions are alike. The range takes no account 
of the form of the distribution within the range. Range is, therefore, 
most unreliable as a guide to the dispersion of the values within a distri- 
bution. 


(iv) Range cannot be computed in case of open-end distributions, 
Uses of Range 
Despite sericus limitations range is useful іп the following cases : 


(i) Quality control. The object of quality control is to keep a 
'check on the quality of the product without 100% inspection. When 
statistical methods of quality control are used, control charts are prepared 
and in preparing these charts range plays a very important role. The 
idea basically is that if the range- the difference between the largest and 
smallest mass produced items— increases beyond a certain point, the 
production machinery should be examined to find out why the items pro- 
duced have not followed their usual more consistent pattern. 


(ii) Fluctuations in the share prices. Range is useful in studying 
the variations in the prices of stock and shares and other commodities that 
are very sensitive to р:ісе changes from one period to another. For 
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example, by computing range we can get an idea about the range of 
variation of, say, gold prices. If the minimum price for 10 gm. in the 
year 1967 was Rs. 150 and the maximum price Rs. 197 this at once tells 
us about the range of variation, ie., Rs, 47 ( 197—150). 

(iti) Weather forecasts. The meteorological department does make 
use of the range in determining, Say, the difference between the minimum 
temperature and the maximum temperature. This information is of great 
concern to the general public because they know as to within what limits 
the temperature is likely to vary on a particular дау. 


(iv) The range is a most commonly used measure of dispersion in 
everyday living. Questions from “How much does your husband make in 


П. THE INTERQUARTILE RANGE OR 
THE QUARTILE DEVIATION 


The range as a measure. of dispersion discussed above has certain 
limitations. Tt is based on two extreme items and it fails to take account 
of the scatter within the range. From this there is reason to believe that 
if the dispersion of the extreme items is discarded, the limited range thus 
established might be more instructive. For this purpose there has been 
developed a measure called the interquartile range, the range which includes 
the middle 50 per cent of the distribution. That is, one quarter of the 
observations at the lower end and another quarter of the observations at 
the upper end of the distribution are excluded in computing the inter- 
quartile range. In other words, interquartile range represents the difference 
between the third quartile and the first quartile, 


Symbolically : 

Interquartile range—Q,—Q;. 

Very often the interquartile range is reduced to-the form of the 
semi-interquartile range or quartile deviation by dividing it by 2. 

Symbolically, 


Quartile Deviation or QD... 9. 


| Where Q.D.—Quartile Deviation, 


Quartile deviation gives the average amount by which the two 
quartiles differ from the median. In a symmetrical distribution the two 


Q:—Q,—Med. and as such the difference can be taken as a measure of 
dispersion, The Median+Q.D. covers exactly 50 per cent of the obser- 
vations. 


In reality, however, one seldom finds a series in business and econo- 
mi- data that is perfectly symmetrical. Nearly all distributions of social 
series are asymmetrical, Inan asymmetrical distribution, Qi and Q, are 
not equidistant from the Median. Аза result an asymmetrical] distribution 
includes only approximately 50 Per cent of the observations, . 


алчалар 
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When quartile deviation is very small, it describes high. uniformity 
or small variation of the central 50% items and a high quartile deviation 
means that the variation among the central items is large. 

Quartile deviation is an absolute measure of dispersion. The relative 
measure corresponding to this measure, called the coefficient of quartile 
deviation, is calculated as follow : 

? Q—Q; 
efficient of Q.D.———7— 

Coefficient of Q.D. 00; 

Coefficient of quartile deviation сап be used to compare the degree of 
variation in different distributions. 
Computation of Quartile Deviation 

The process of computing quartile deviation is very simple since we 
have just to compute the values of the upper and lower quartiles. The 
following illustrations would clarify calculations : 

Individual Observations uim 

Illustration 3. Find out the value of quartile deviation and its coefficient from 
the following data : "Май, 


Roll No. 1 2 3 4 5 6 7 
Marks 20 28 40 12 30 15 50 


Solution: CALCULATION OF QUARTILE DEVIATION 
Marks arranged in ascending order : 12 15, 20 28 30 40 50 


/ 
01-5 of (Ч = )® itém- Size of ( TH )=2na item 
Size of 2nd item is 15. Thus Q1—15 < 
Qs Size of x NEL )in item=Size of ( E Jte item=6th item 


= 


Size of 6th item is 40. Thus Q1—15 
29:- 01 
OD irs 
Оу=15 and Q,—40 
40-15 iy, 
2 py » 


i Q-Q, 40-15 25 _ 
Coefficient of Q.D.— 5 YQ; OFIS =-% 0-455. 


Q.D.= 


Discrete Series 
Ilinstration 4. Compute coefficient of quartile deviation from the following data : 


Marks 10 20 30 40 50 80 
No. of students 4 т 15 8 7 2 
Solution: CALCULATION OF COEFFICIENT OF QUARTILE 
у DEVIATION r 
eee 
Marks Frequency cf. Marks ` Frequency c f. 
- — 
10 4 4 40 8 34 
20 T 11 50 7 41 
30 15 26 80 2 43 


Q,-Sie of th item =F n item. 


SME—10 77-15 
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Size of 11th item is 20. Thus Q,=20 
Q;-Size of 3 (=>) th item= 4.3374 item. 
Size of 33rd item із 40. Thus Оз=40 


Q.D.» 979. 30720 io. 


i 03-0, _ 40—20 o 
Coefficient of Q.D.— tÓ: -070 0:333. 


Continuous Series 
Illustration 5. Compute quartile deviation from the following data : 


Farm Size No. of farms Farm Size No. of farms 

(acres) (acres) 

0—40 394 161 - 200 169 
41—80 461 201—240 113 
81—122 391 241 and over 148 

121—160 334 


(В. Com., Delhi, 1967) 
Solution: CALCULATION OF QUARTILE DEVIATION 


Farm Size No. of farms c.f. Farm Size No. of farms c.f. 
0—40 394 394 161—200 169 1,749 
41—80 461 855 201—240 113 1,862 
81—120 391 1.246 241 and over 148 2,010 
n 121—160 334 1,580 
Qi Size of Ju item = 291 _502'sth item, 
Hence Q; lies in the class 41—80. But the real limit of this class is 40:5—80'5. 
7 4 —с./. 
= i 
Q,=L+ F x 


14055, P. = 502, c.f. 394, f=461, i- (80:5—40:5)-40. 


к 40:54, (502:5—394) я i 

. 01=40:5+ —À gr —x40—40:5--94—49:9 
Т а 3х2,010 

Q,-Size of Эһ item = 32019 _ 15075th item, 

Hence Q; lies in the class 121—160 ; the real limit of this class is 120:5—160'5. 


3N 


= es. 


Omer A xi - 1205413075 1246. x40 
12054-5575 x ape 12054313- 1518 


ор, - 22501 151999 s095, 
Merits and Limitations of Quartile Deviation 


Merits. (i) In certain respects it is superior to range as а measure 
of dispersion. 

(5i) It has a special utility in measuring variation in case of open- 
end distributions or one in which the data may be ranked but measured 
quantitatively. 
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(iii) It is also useful in erratic or badly skewed distributions, where 
the other measures of dispersion would be warped by extreme values. The 
quartile deviation is not affected by the presence of extreme values.* 


Limitations. (i) Quartile deviation ignores 50% items, i.e, the 
first 25% and the last 25%. As the value of quartile deviation does not 
depend upon every item of the series it cannot be regarded as a good 
method of measuring dispersion. 

(ii) It is not capable of mathematical manipulation. 

(sii) Its value is very much affected by sampling fluctuations. 


(iv) It is in fact not a measure of dispersion as it really does not show 
the scatter around an average but rather a distance on a scale, i.e., qu vtile 
deviation is not itself measured from an average, but it is а positional 
average. Consequently som: statisticians зрзак of quartile deviation as 
measure of partition rather than as a measure of dispersion. If we really 
desire to measure variation in the sense of showing the scatter around an 


average, we must include the deviation of each and every item from an 
average in the measurement. 


Because of the above limitations quartile deviation is not often useful 
for statistical inference. 


Percentile Range 


Like semi-interquartile range, the percentile range is also used as a 
measure of dispersion. Percentile range of a set of data is defined as : 


Percentile КапвеТРю Pi A 


where Py, and Py» are the 99th and 10th percentiles respectively. The 


semi-percentile range, t.2., (Eg o») can also be used, but is not com- 


monly employed. ; 
Ш. THE AVERAGE DEVIATION 


The two methods of dispersion discussed above, namely, range and 
quartile deviation, are not measures of dispersion in the strict sense 
of the term because they do not show the Scatterness around an average. 
However, to study the formation of a distribution we should take the 
deviations from an average. The twg;üjher measures, namely, the average 
deviation and the standard devia'ion, help us ia achieving this goal. 


The average deviation is sometimes called the mean deviation. It 
is the average difference between the items in a distribution and the median 
or mean of that series. Theoretically there is an advantage in taking the 
deviations from median because the sum of the d-viations of items from 
median is minimum when signs are ignored However, in practice the 
arithmetic mean is more frequently used in calculating the value of average 
deviation and this is the reason why it is more commonly called mean 
deviation. In any case, the average used must be clearly stated in a given 
problem so that any possible confusion in meaning is avoided. 


* The Range and Q.D. are positional measures of dispersion as they are based 
on the position of certain items in a distribution. 
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tion of Mean Deviation — Individual Observations 
The formula for computing mean deviation is 


Mean Deviation or мр.=? А T | 


where | D | within parallel lines denotes deviations from median ignoring 
signs : 


Steps. (i) Compute the median of the series. 
(ii) Take deviations of items from median ignoring + signs and 
denote these deviations by | D. 

(iii) Obtain the total of these deviations, i.e, È | D |. 


liv) Divide the total obtained in step (iii) by the number of 
observations. 

If a distribution is normal, the mean + mean deviation is the range 
that will include 57°5 per cent of the items in the series. If it is moderately 
skewed, then we may expect approximately 57°5 per cent of the items to 
fall within this range. Hence if average deviation is small, the distribution 
is highly compact or uniform, since more than half of the cases are 
concentrated within a small range around the mean. 

The relative measure corresponding to the mean deviation, called the 
coefficient. of mean deviation, is obtained, by dividing mean deviation by 
the particular average uscd in computing mcan deviation. Thus, if mean 
deviation has been computed from median, the coefficient of mean 
deviation shall be cbiained by dividing mean deviation by median. 

М м.р. 


Meulan 


Coefficient of M.D.= 


If mean has been used while calculating the value of mean deviation, 
in such a case coefficient of mean deviation shall be obtained by dividing 
mean deviation by the mean, 

Illustration 6. Calculate the mean deviation of the two income groups of five 
and seven members given below : ; 

I(Rs): 4000 4200 4400 4,00. 4,800 
и (Вв): 3,000 4,000 4200 4,400 4,600 4,800 5,800 
(B. Com., Delhi, 1969y 


Solution : CALCULATION OF MEAN DEVIATION 
Group 1 Group 11 
(Deviation from median 4,400, (Deviation fro» median 4,400, 
ignoring signs) ignoring signs) 
MERIN gr fDi 
4,000 400 3,000 1,400 
4,200 200 4,000 400 
4,400 0 ` 4,200 200 
4,600 200 4,400 0 
4,800 400 4,600 200 
4,800 400 
5,800 1.400 


N=5 z|D|-120 | N=7 5i D | =4,000 


— 


* If mean deviation is computed from mean then in that case | D | shall denote 
deviations of the items from mean, ignoring signs. 


SÍEASURES OF VARIATION "E211; 


Mean Deviation: I group мр.= 5121. 


| D | deviation from median ignoring sign, 
Median Size of N+" th item = = 
Size of 3rd item is 4,400 

— — 
This means that the average deviation of the individual incomes from the 
median income is Rs. 240. s 

Mean Deviation : Il group 
Median=Size of 


Size of 4th item is 4,400 
=| D | =4,400 №7. 


м.р.-4990 ~5т4 


1 зга item 


Ра 


NH th item =! =4th item. 


Note. If we where to compute coefficient of mean deviation we shall divide, 
mean deviation by median. Thus for the first group 


у 240 — .. 
Coefficient of M.D.—1 209 07054 
and for the second group * 
4 51143. e. 
Coefficient of M.D.— 5 300 =0°130. 


Calculation of Mean Deviation—Discrete Series 
In discrete series the formula for calculating mean deviation із 


M.D.= NPL 


.| D.| denotes deviation trom median ignoring signs. 

Steps. (i) Calculate the ‘median of the series. 

(ii) Take the deviations of the itéms from median ignoring signs and 
denote them by |D |. 2 

(iii) Multiply. these deviations by the respective frequencies and 
obtain the total 2f| D |. 

(iv) Divide the total obtained in step (ii) by the number of observa- 
ions. This gives us the value of mean deviation. 

Illastration 7. Calculate mean deviation from the following series : 
х 10 1154: 2 13 14 


з f 3 12 18 12 3 
Sotation. CALCULATION OF MEAN DEVIATION 
ОР ond? b: f.D\ c. f. 

AES LIS Tee 6 3 

n 12 1 12 15 

12 18 0 0 33 

13 12 1 12 45 

14 3 2 6 


Pe ee 
° ` Ne48 zf|D|-36 


дызы NET MIS LE ee 
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M.D.= 2A 121 
Median=Size of МЕТ th item = A841 одур item 
Size of 24:5th item is 12. 
Hence Median=12 


f = o 
E M.D eqq 0775. 


Calculation of Mean Deviation— Continuous Series 


For calculating mean deviation in continuous series the procedure 
remains the same аз discussed above. The only difference is that here 
we have to obtain the mid-points of the various classes and take devia- 
tions of these mid-points from median. Тһе formula is the same, i.e., 


Mp. HDL, 
Illustration 9. Calculate mean deviation from mean from the following data : 
Marks No. of students Marks No. of students. 
0—10 4 40—50 10 
10-20 6 50—60 6 
20—30 10 60—70 4 
30—40 20 


(B. Com., Bangalore, 1973) 
Solution : CALCULATION OF MEAN DEVIATION FRCM MEAN 


—35 | т—35 
Marks f m (“= ) Ја MSS 
d' IDI PE 
0-10 4 SF MR беч 3 12 
10—20 6 15150903120 Waltz) 2 12 
20—30 10 Засан 10, Cun oo 
30-40 20 35 0 0 0 0 
40—50 10 45 1 10 1 10 
50—60 6 55 2 12 2 12 
60—70 4 3 12 3 12 
N=60 zfd'=0 3f] D | =68 
мр.= A 3Pl хс 
Raat хс 
A=35, 2 fd'=0, N=€0, С=10. 


: LEE 
Ед X-354- gg Х10=35 
Xf | D | -68, N=60, C=10. 
68 7 
M.D.= $5 хЮ=1133. 
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Calculation of Mean Deviation —Short-cut Method : 
When mean ог median is іп fractions, calculation of mean deviation 
becomes difficult. The computations can very much be simplified by 
following a short-cut method the formula for which is as follows : 
M.D. _>т/„—®т{в—(5/—3/з)Х 
(from Mean) — N 


zia Emfa Утјв —(Zfa —Sfs) Med. 
(from Median) N 
Zmf^ and ZmfB stand for totals of products of mid-points and 
frequencies corresponding to mid-points above and below the average 
value respectively. $ 

Zfa and ZfB represent the total of frequencies pertaining to mid- 
points above and below the average value. 

N=Total number of observations. 
Illustration 10. (а) Calculate mean deviation from median from the following 


ta : 
Marks less than No. of students Marks less than No. of students 
80 100 40 32 
70 90 30 20 
60 80 20 13 
50 60 10 5 
(B. Com., Delhi, 1973) 
Soiution : CALCULATION OF MEAN DEVIATION FROM MEDIAN 
Marks T M.P. mf cf. 
m 
0—10 5 5 25) 5 
10—20 8 15 201 Zmf, ‘B =740 13 
20 -30 7 25 17: 20 
30—40 12 35 420 If, B =32 2 
30-60 20 5 1100 уту, =3760 D 
- 1 = 
60-70 10 65 650 а 90 
70—80 10 75 750 bi 77 2-68 100 
N=100 d 
pa url 
Med.=Size of th item 400 —soth item 
Median lies in the class 40—50, 
М. —c.f. 
Med=L+ xi 
where, L=40 Zs, c.f.=32, f=28, i=10 
m Med.= 40 + 225% 1024946424643. 
е Баг СБ (27, —fg) Med 
(from Median) N, 


imf, =3,160, mfg «740, Bf, =68, Уз 722, Med=46:43. 
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Mp, 239-90-(8-554645 =D MO-N ме; 


(5) For the above data calculate mean deviation from mean : 


Marks f M.P. mf. 
m 
0—10 5 5 
10—20 8 15 120 Emfp =740 
20—30 7 25 IIS ул =32 
30—40 12 35 420 B 
40- 50 28 45 12601 Efq =3760 
20 55 1001 y 26 
60—70 10 65 650 А 
70—80 10 75 750 
N=100 Emf=4,500 
Zmf_ 4500 _ 
Хеу 100 745 
M т/д = т/р -(f, -EXfgMC 
(from Mean) N 
Emf , 3,760, Zmf р —740, Ef, =68, 2f5 =32 
X-—45, N=100 
__3760—740—(68—32)45 _ 3760—740—1620 _ 1400 _ 
кы» 100 100 100 71^ 


Merits and Limitations of Mean Deviation 


Merits. (i) The outstanding advantage of the average deviation is 
jts relative simplicity. It is simple to understand and easy to compute. 
Anyone familiar with the concept of the average can readily appreciate the 
meaning of the average deviation. If a situation requires a measure of 
dispersion that will be presented to the general public or any group not 
thoroughly grounded in statistics, the average deviation is very useful. 


(ii) It is based on each and every item of the data. Consequently 
change in the value of any item would change the value of mean 
deviation. 

(iii) Mean deviation is less affected by the values of extreme items 
than the standard deviation. 


(iv) Since deviations: are taken from a central value, comparison 
about formation of different distributions can easily be made. 


Limitations. (i) The greatest drawback of this method is that 
algebraic signs are ignored while taking the deviations of the items. For 
example, if from twenty, fifty is deducted we write 30 and not —30. This 
is mathematically wrong and makes the method non-algebraic. If the 
signs of the deviations are not ignored the net sum of the deviations will 
be zero if the reference point is the mean or approximately zero if the 
reference point is i 

(ii) This method may not give us very accurate results. The reason 
is that mean deviation gives us best results when deviations are taken from 
median. But median is not a satisfactory measure when the degree of 
variability in a series is very high. And if we compute mean deviation 


MEASURES OF VARIATION E-815 
\ 


from mean that is also not desirable because the sum of the deviations 
from mean (ignoring signs) is greater than the sum of the deviations from 
median (ignoring signs). If mean deviation is computed from mode that 
is also not scientific because the value of mode cannot always be 
determined. 


(iii) It is not capable of further algebraic treatment. 
(iv) It is rarely used in sociological studies. 


Because of these limitations its use is limited andit is overshadowed 
as a measure of variation by the superior standard deviation. 


Usefulness of thc Mean Deviation. The serious drawbacks of' 
the average deviation should not blind us to its practical utility. Because 
of its simplicity in meaning and computation, it is especially effective in 
reports presented to the general public or to groups not familiar with | 
statistical methods. This measure is useful for small samples with no 
elaborate analysis required. Incidentally it may be mentioned that the 
National Bureau of Economic Research has found in its work on forecasting 
business cycles, that the average deviation is the most practical measure of 


dispersion to use for this purpose. 
IV. THE STANDARD DEVIATION | 


The standard deviation concept was introduced by Karl Pearson’ in | 
1893. It is by far the most important and widely used measure of studying 
dispersion. Its significance lies in the fact that it is free from those defects | 
from which the earlier methods suffer and satisfies most of the properties | 
of a good measure of dispersion. Standard deviation is also known as 
foot-mean square deviation for the reason thatitis the square root of the 
means of the squared deviations from the arithmetic mean. Standard. 


deviation is denoted by the small Greek letter о (read as sigma). у 


The standard deviation measures the absolute dispersion or variability 
of à distribution ; the greater the amount of dispersion or variability the 
greater the standard deviation, for the greater will be the magnitude of 
the deviations of the values from their mean. A small standard deviation 
means a high degree of uniformity of the observations as well as homoge- 
neity ofa series; a large standard deviation means just the opposite. 
Thus if we have two or more comparable series with identical or. nearly 
identical means, it is the distribution with the smallest standard deviation 
that has the most representative mean. Hence standard deviation is 


extremely useful in judging the representativeness of the mean. | 


1 


Difference between Average Deviation and Standard Deviation 


Both these measures of dispersion are based on each and every item 
of the distribution. But they differ in the following respects : 


(i) Algebraic signs are ignored while calculating mean deviation | 
whereas in the calculation of standard deviation sigus are taken into | 
account. | 

(ii) Mean deviation can be computed either from median or mean. ! 
The standard deviation, on the other haud, is always computed from the 
arithmetic mean because the sum of the squares of the deviations of items 
from arithmetic mean is the least. 


2 
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Calculation of Standard Deviation— Individual Observations 

In сазе of individual observations standard deviation may be 
computed by applying any of the following two methods : 

1. By taking deviation of the items from the actual mean. 

2. By taking deviations of the items from an assumed mean. 


l. Deviations taken from Actual Mean. When deviations are 
taken from actual mean the following formula is applied : 


о же 
N 
where, z=(X—Y). 
Steps. (i) Calculate the actual mean of the series, i.e, X. 
(ii) Take the deviations of the items from the mean, $e. find © 
(X—X). Denote these deviations by x. 
(iii) Square these deviations and obtain the total Dz*. 
(iv) Divide Уз? by the total number of observation, ї.6., N and 
extract the square-root. This gives us the value of standard deviation. 


2. Deviations taken from Assumed Mean. When the actual 
mean is in fractions, say, in the above case 123 674, it would be too 
cumbersome to take deviations from it and then obtaining squares of these 
deviations. In such a case either the mean may be approximated or else 
the deviations be taken from an assumed mean and ‘the necessary adjust- 
ment be made in the value of standard deviation. The former method of 
approximation is less accurate and, therefore, invariably in such a case 
deviations are taken from assumed mean. 


When deviations are taken from assumed mean the following 
formula is applied : 
ёр СНЕ ү, 
SENN E 


Steps. (i) Take the deviations of the items from an assumed mean, 
ie, obtain (Х—А). Denote these deviations by d Take the total of 
these deviations, i.e., obtain, 


(ii) Square these deviations and obtain the total Zd?. 
(iit) Substitute the values of Ed?, Zd and N in the above formula. 


А Illustration 11. The table below gives the marks obtained by В. Com., students 
with Roll Nos. 1 to 10 at an exmination. Calculate standard deviation, 


Roll No. Marks Roll No. Marks 
1 43 6 60 
2 48 s 7 37 
3 65 8 48 
4 5 9 78 
5 31 10 


59 
, (B. Com., Tamil Nadu, 1973) 
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Solution : CALCULATION OF STANDARD DEVIATION BY THE 
ASSUMED?MEAN METHOD 


Roll No. Marks (X—50) 
x d а 
1 43 -7 49 
2 48 -2 4 
3 65 +15 25 
4 51 - 7 49 
5 31 —19 361 
6 60 -10 100 
1 37 -13 169 
8 48 -2 4 
9 78 28 784 
10 59 59 81 
N=10 EX = 526 >4=26 54*=1,826 


оС: 
o=,| 24% — Zd у 
beige, 

Ed2=1,826, 24-26, N=10 


c24 22- EA —/iSz6- 6167 1758—1326 


Calculation of Standard Deviation— Discrete Series 

For calculating standard deviation in discrete series any of the 
following methods may be applied : 

1. Actual mean method. 

2. Assumed mean method. 

3. Step deviation method. 

1. Actual Mean Method. When this method is applied deviations 
are taken from the actual mean, i.e., we find (X—X) and denote these 


deviations by х. These deviations are then squared and multiplied by the 
respective frequencies. The following formula is applied : 


A 
T S where, 2=(X—X) 


However, in practice this method is rarely used because if the actual 
mean is in fractions the calculations take а lot of time. 

2. Assumed Mean Method. When this method is used the 
following formula is applied : 


o=] T Е ( EI where d=(X—A). 


Steps. (i) Take the deviations of the items from an assumed mear 
and denote these deviations by d. 

(ii) Multiply these deviations by the respective frequencies and 
obtain the total, X fd. j 

(iii) Obtain the squares of the deviations, i.e., calculate d. 

(iv) Multiply the squared deviations by the respective frequencies 
and obtain the total Zfd*. ў 
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Substitute the values in the above formula. 
Illustration 12. Calculate the standard deviation from the data given below: 


Size of item Frequency Size of item Frequency 
35 3 TS 85 
45 7 8'5 32 
55 22 95 8 
65 60 


(B. Com , Business St., Andhra, 1972) 
Solution : CALCULATION OF STANDARD DEVIATION 


x (X—-6:5) 

Size of item » d fd fan 
35 3 —3 — 9 27 

45 7 —2 —14 28 

55 22 =l -22 22 

65 60 0 0 0 

75 85 1 85 85 

85 32 2 64 128 

95 8 3 24 72 
М=217 Zfd-128 Bfd*=362 


_ | we ү За у> 
х 4 N -( x) 
where, Efd*=362, Zf4— 128, N=217 
= 362 1128V.. SE тт aa 
eu 38 (20) A/T668—348 = 1-149 
. 3. Step Deviation Method. When this method is used we take 
a common factor from the given data. The formula for computing 


standard deviation is : 
CH £f Yf NM 
| (QE) хо 
where, e and C = Соттоп factor. 


'The use of the above formula simplifies calculations. 
Illustration 13. Find the standard deviation for the following distribution 
X: 45 14:5 24:5 34:5 44:5 545 645 
Fs 1 5 12 22 17 9 4 
(В.А. Hons., Econ. Delhi, 1^ 74) 
Solution : CALCULATION OF STANDARD DEVIATION 


( X—34°5 
10 

x f а fa' fa" 
45 П -3 —3 9 
M$. 5 —2 —10 20 
24:5 12 -1 -12 12 
34-5 22 0 0 0 
44:5 17 1 17 17 
54:5 9 2 18 36 
645 4 3 12 36 


N=70 Id—22 Xfüi-130 
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Miam е 
ШЖ ts хл ү 
[AE UN Jac 


Here Xfati—-130, Zfd' =22, C=10, N=70 
Substituting the values 


———— — AT 


10 72ү г 
8 "ES ZY v rer ree 


=V 1751x107 1:326x10—1326 
Calculation of Standard Deviation—Continuens Series 
In continuous series апу of the methods discussed above for discrete 
frequency distribution can be used. However, in practice it is the step 
deviation method that is mostly used. The formula is 


=n) fd’ wY 
be J zü* ( Af xo 
N N ) 
d=>p where C=Common factor. 
Stepe. (3) Find the mid-points of various classes. 
$) Take the deviations of these mid-points from an assumed mean 


and denote these deviations by d. 
(iii). Wherever possible take a common factor and denote this 
] 


column by d'. 
(iv) Multiply the fiequencies of each class with these deviations and 


obtain È fa’. 
(v) Square the deviations and multiply them with the respective 
frequencies of each class and obtain ја”. 
Thus the only difference in procedure in case of continuous series is 
to find mid-points of the various classes. 
" Tilustration 14. Calculate the standard deviation for the following distribution 
giving 300 telephone calls according to their duration in seconds. 


Duration No. of Calls Duration No. of Calls 
(in seconds) (in seconds 
0—30 9 120—150, 81 
30—69 17 150—120. 44 
60—90 43 180- 21 24 
90—120 82 $ 
(B. Com. Bombay, 1972) 
Solution : CALCULATION OF STANDARD DEVIATION 
STA шел ade visée DOE 
—10. 
(ie Mon No.ofCalls тр. m. 
f m. d' fd’ fd 
И Аааа аата ааа Dos 
0-— 30 9 15 —3 —21 81 
30— 60 17 45 E a 68 
69— 90 43 15 -1 —43 43 
90—120 82 105 0 0 0 
120—150 81 135 H 81 81 
150—180 44 165 T 88 176 
180 210 24 195 3 72 216 
N=300 Zfd'—131 уја'?=665 
TE zfd' a 
se] ?— o y: E ) хс 


| 
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5/4'%=665, Zfd' 137, N=300, C30 


o=] 5 m ) x30— 4/2217—0 209 x 30 


= ү 008 x 305 1:417 x 3042751 
Illustration 15. Calculate the standard deviation from the following data : 


Temp. ‘C’ No. of days Temp. ‘C’ No. of days 
—40 to 30 10 0 to 10 65 
—30 to 20 28 10 to 20 180 

= 20 to 10 30 20 to 30 10 
—101:o 0 42 


(C.A., May, 1972) 
Solution: | CALCULATION OF STANDARD DEVIATION 


Temp.'C'  Mid-points {m—(—5)} ( _m—-(-5) 
m f d ( 10 
d' fd’ fa? 
—40 to —30 —35 10 —30 -3 —30 90 
—30 to —20 —25 28 —20 —2 —56 112 
—20 to —10 —15 30 —10 —1 —30 30 
—10to 0 —-5 42 0 0 0 0 
Oto 10 45 65 10 1 65 65 
10to 20 15 ' 180 20 2 360 720 
2010 30 425 10 30 3 30 90 
N=365 хуа" =339 zfd'?—1,107 
уа raw 
ema] ay -( к) хс 
Неге Zfd-}=1,107, Zfd' — 339, N=365, C=10 
Substituting the values 
1107 39 ү. Күлү”. 
Ni I EE ) x10—4/3:03—0:865 x 10 


= У 727165 х10= 147 х 10— 147 C 


Illustration 16. The table below gives the distribution by size (in terms of paid-up 
capital) of 40 different ficms selected at random from 800 firms in a particular State. 


Paidup capital No. of firms Paid-up capital No. of firms 
(in nearest (in nearest 
*000 rupees) *000 rupees) 
1—50 13 201—250 4 
51—100 9 251—300 5 
101—150 0 301—350 2 
4 151—200 7 ! 


Solution: | CALCULATION OF MEAN AND STANDARD DEVIATION 
ee 
Paid-up capital ( m—175:5 ә 


in nearest *000 50 y 
а fa" fans 


rupees m f : 
1—50 | 255 13 -3 -39; 117 

51—100 75°5 9 -2 —18 36 

101—150 125-5 0 -1 0 0. 

151-200 1755 1 0 0 0 ¢ 

201—250 2255 4 1 га +. 4 ї 

251—300 2755 5 L2 410. 72,20 К 

301—350 325 5 2 43 +6 18 


$e : 


N=40 Lfj'——371fd*-195 ` zie 


С.Е R.T., West Benga} 


ПІ... СА ге 
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Д = 
Ў-А+ z xC=175:5+ LBD x sou1755- 4625712925. 
Total paid-up capital 12925 x 800—Rs. 1,03,400 
[Mae ү хуа 2 ras {31 
© = = ) xc af ou 


40 X 40 
—4/4875— 856 x 50— 4/4019 x 50—2:005 x 50— 10025 
This method is difficult to apply where the frequencies and the 


values of the variable are large. In such a case step deviation method 
discussed earlier should be used. 


Mathematical Properties of Standard Deviation 


Standard deviation has some very important mathematical properties 
which considerably enhance its utility in statistical work. 

1. Combined Standard Deviation. Just as itis possible to compute 
combined mean of two or more than two groups, similarly we can also 
compute combined standard deviation of two or more groups. Combined 
standard deviation is denoted Бу всу and is computed as follows : 


с =, Wyo P+ Noo +N di) +N od," 
ag 1 N, +N, 
| where, сіз = combined standard deviation ; o,=standard deviation 
of first. group ; ¢,=standard deviation of second group ; 
d,=(X,—X,2) ; d,=(X,— эз). 
The above formula can be extended to find out the standard devia- 
tion of three or more groups. For example, standard combined deviation 
of three groups would be : 


| 
| 
се АЙ No ENS ENS + Wid EN Nu 
D г UU Neb NeENS 
| where, di (X — X23) ; d7(X, Xi) ; d — (X5 —X,). 
| Illustration 17. The numbers examined, the mean weight and. the standard 


deviation in each group of examination .and two medical examiners are given below. 
| Find the mean weight and standard deviation of both the groups taken together. 


i Medical examiner Number examined М вала ласи 
А n 1b. 
y 50 113 65 
B 60 55120 82 
Е (B. Com., Bombay, 1973) 
poo MAH М„Х» 
An NEN. 


ete Ме, 


f N,=50, N62, X1=113,X,=120 
Litrary 2 
e 


= (50 x 113)+(60x 120). 5650-- 7200 12,850 


Хм= — — 30-60 ио и 
ае еа 
a Ni Nya? E No; -N dst Nds? 
n Ni +N; 


N,550, 6,=6:5, N, =60, 08:2 


2S qnid-points remain the same it is not necessary to adjust the class limits 
-while calculating mean and standard deviation. 
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dyn (X, Xu)- (113-116 82)— 382 
d,- (X, — X4) - (120—11682)—3:18 
EE ey us [ERED RG POST 
t [aren 


“узт т", тт Т. 
110 


74832 4/6803 —=8'25 
-4 110 


Illustration 18. The number of workers employed, the mean wage (in Rs) 

month and the standard deviation (in Rs.) in each section of a factory ere given 

boo: Calculate the mean wages and standard deviation of all the workers taken 
together. 


Section No. of workers Mean wage Standard deviation 
employed in Rs. in Rs. 
A 50 113 6 
: 2 in ; 
1 
(1.C.W.A., 1973) 
Solution : Yan NENA +N Aa 
зд NSENSEN, 
Му (50 х113)-++(60 x 120)--(90 x 115) 
s a = - 50-60-90 : 


5 555041 204:10,330,_ 23209 =Rs. 116. 


Combined standard deviation of three series. 
an= J Ns No E No E Nid} EN dt М.а, 
М+М 
ау=(Ху— Хз) or (113-1169) —3 
d,— (Xs—X3:3) or (120-119) -4 
dy-(X4—X19,) or (115—116) = —1 
сіз A _50х (6)? +60(7)*-+90(8)? + 50( —3)* + 60.4)°+90(— 1)? 
50+60+90 


1800-+2940+5760+4507 960490 [12700 /60- 
== CO oO — = жа ]" 
A 200 = ЖЮ akan 


2. Standard deviation of n natural numbers. The standard deviation 
of the first zi natural numbers* can be obtained by the following formula + 


| =!) 


Thus the standard deviation of natural numbers 1 to 10, will be 


se эы! Hoe ESOS EN m er 
sz] 715-00 04| 1599—1825 =287. 


Note. The answer would be the same when direct ‘method of 
calculating standard deviation is used. But this holds good only for 
natural numbers. ES 


—_ 


*By natural numbers we mean only positive integers, e.g., 1, 2, 3,4, 5,... 
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/ 
3. The sum of the squares of the deviations of items in the series 
from their arithmetic mean is minimum. In other words, the sum of the 
squares of the deviations of items of any series from a value other than the 
arithmetic mean would always be greater. This is the reason why 
standard deviation is always computed from the arithmetic mean. 
4, For a symmetrical distribution, the following area relationships 
hold good : 
Mean lo covers 68 27% of the items. 
Meanz-26 covers 954575 of the items. 
Mean+ 3o covers 99:73% of the items. 


This can be illustrated by the following diagram : 


DISTRIBUTION OF THE ITEMS IN TERMS OF MEAN AND STANDARD DEVIATION 


X40 


x20 X+30 


Relation between Measures of Dispersion 


In a normal distribution there is a fixed relationship between the 
three most commonly used measures of dispersion. The quartile devia- 
tion is smallest, the mean deviation. next and “= standard deviation is 
largest, in the following proportions* : 


Re Led 
QD.— rx and M.D.= 5 


These relationships can be easily memorized because of the seqüence 
2, 3, 4,5. The same proportions tend to hold true for many distributions 
that are quite normal. They are useful in estimating one measure of 
dispersion when another is known, or in checking roughly the accuracy of 
a calculated value. If the computed с differs very widely from its value 
estimated from Q.D. or M.D. either ап error has been made or the 
distribution differs considerably from normal. 


Another comparison may be made of the proportion of items that 
are typically included within the range of one Q.D., M.D. or S.D. 
measured both above and below the mean. In a normal distribution : 


XQ Р. includes 50 per cent of the items 


X-M.D. includes 57°51 per cent of the items 
. Х-Ев includes 68°27 per cent or about two-thirds of the items. 


*More precisely Q.D.—0:67458 and M.D.—0:7979s. 
SME—10777-16 
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Ilustration 19, The breaking strength of 80 ‘test-pieces’ of a certain alloy is 
given in the following table, the unit being given to the nearest thousand pounds per 


square inch. 
Breaking strength No. of pieces 

44— 46 3 
46—41 24 
48—50 27 
50—52 21 
52—54 5 

Total 80 


Calculate the average breaking strength of the alloy and the standard deviation 
Calculate the percentage of observations lying within the limit of mean +20. 
( 


1.C.W.A., 1965 
Solution: CALCULATION OF MEAN AND STANDARD DEVIATION 

Breaking strength f m а! Ја! fa's 
44—46 3 45 —2 —6 12 
46—48 24 47 —1 —24 24 
48—50 27 49 0 0 0 
50—52 21 51 1 21 21 
52—54 5 53 2 10 20 
№80 Zfd'—1 У/4'%=77 


A249, Zfd' —1, №80, C=2. 


X =49+ -iy X224 40025-49025 


ERE) e ee) 
= 7079625 — 00002 x 2=0°981 x 21:962 


X—49:025, c= 1:96 
The value of 


X.L20—49:025-L2(1962)949:025--3:924 or —45:101—52:949—45—53 app. 


We have to calculate the percentage of items lying between, 45—53. For this 

wefmake an assumption that the items are equally distributed within each class. 

| Since between 44—46 there are 3 frequencies at 45 there would be 1°5. Similarly at 
53 the frequency would be 2:5. Thus the total frequency between 45—53 is 


(15--244-27--21--2:5) or 76. The percentage is = x100=95. Thus there are 95 

| per cent observations lying within the limits mean +20. 

| Checking Accuracy of Computations 

i The Charlier's test for checking accuracy o! computation can also be 
applied in base of standard deviation. For applying the test we have to 


add one more column, namely f(d’+1)* to our table. "The test consists of 
the equation : 


худа 4-1] - Z(47)4-2X(f4) +f 


This means that the sum of the last column should be equal to the 


sum of the f column plus the sum of the fd column plus twice the sum 
of the (fd^) column. 
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j Illustration 20. Calculate standard deviation and apply Charlie's check to 
werify the calculations. 


Marks No. of students Marks No. of students 
0—10 8 30—40 6 
10—20 12 40—50 4 
20—30 20 
Soiution: CALCULATION OF STANDARD DEVIATION 
Marks Ж Mid-points 4! fd! fa? f(d!4-1)* 
0—10 8 5 —2 —16 32 8 
10—20 12 15 —1 —12 12 0 
20—30 20 25 0 0 0 20 
30—40 6 35 1 6 6 24 
40 $0 4 45 2 -8 16 36 
5/=50 хуа! zfd* EL f(d/+1)*] 
-—14 =66 =88 


z[f(d'4-19] - 2/45) +220) +27 
; 88—66--2(— 14)+50=88 
Thus the calculations are correct 
—————u 


Гууа*® ү X ү? 66 EST 
(7 ) хс -\[-%—( $e.) x10 
—/131-9078x 01:114 x 10 11714 


Coefficient of Variation 


The standard deviation discussed above is an absolute measure of 
dispersion. The corresponding relative measure is known as the coefficient 
of variation. This measure developed by Karl Parson is the most com- 
monly used measure of relative variation. It is used in such problems 
where we want to compare the variability of two or more than two series. 
That series (or group) for which the coefficient of variation is greater 
as said to be more variable or conversely less consistent, less uni- 
form, less stable or less homogeneous. On the other hand, the series for 
which coefficient of variation is less is said to be less variable or more 
consistent, more uniform, more stable or more homogeneous. Coefficient 
‘of variation is denoted by the symbol V and is obtained as follows : 


Coefficient of variation or C.V.— ** 100. 


It may be pointed out that although any measure of dispersion can 
be used in conjunction with any average in computing relative dispersion, 
statisticians, in fact, almost always use the standard deviation as the 
measure of dispersion and the arithmetic mean as the average. When the 
relative dispersion is stated in terms of the arithmetic mean and the 
standard deviation, the resulting percentage isknown as the coefficient of 
variation or coefficient of variability.* 

A distinction is sometimes made between coefficient of variation and 
The former is always a percentage, 


coefficient of standard deviation. 
the latter is just the ratio of standard deviati 


* W.A. Waugh : Elements of Statistical Methods. 


onto mean, i.e, 
x 


с 
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Illustration 21. From the following table of marks obtained by A and B in 10 
tests of 100 marks each, find out who is more intelligent and who ts more consistent : 
A : 25 50 45 30 70 42 36 4 34 60 
В:10 70 5 20 95 55 42 60 48 80 
(B. Com., Bangalore, 1972) 


Solution. In order to find out the тоге intelligent students between A and B we 
will calculate the average marks and for finding out the more consistent student we will 
compare the cofficient of variation. 


COMPUTING MEAN AND STANDARD DEVIATION 


Student A Student B 

(X—44) (Y—53) 
x x x Y У 
25 -ю 361 10 -43 1,849 
50 6 36 70 17 289 
45 +1 1 50 -3 9 
30 —14 196 20 —33 1,089 
70 +26 676 95 42 1,764 
42 -2 4 55 r2 
36 -8 64 42 -11 121 
48 -4 16 60 7 49 
34 —10 100 48 -5 25 
60 - 16 256 80 21 -129 

®Х=440 Ex=0 = Ex®=1,710 zY-2530 Ху=0  у%=5,928 
ПТ СР TCU KU = 

. » IX 440 
Student А: X -—N- 7-10 244 

Зх? "UM ex 

ere =a 10 =V = 13-08 
c 13:08 
С.И: X х100= a * 100=29°73. 

i L. EK 330 

Student В: y= TRENT =53 


> 928 —_ 
o=,/ [> B әгә —2#35 


CY.- S x100= 2435 . 100—45:94 


53 


Since average marks obtained by student B are higher he may be regarded as 
more intelligent student, However, student A is more consistent because coefficient of 
variation is much less in case of students А than student B. 


Illustration 22. Suppose that samples of polythene bags from two manufacturers 
A za В, аге tested by a prospective buyer for bursting pressure, with the following 
results : 


Bursting pressure Number of bags 
(b.) A B 
50—99 2 9 
100—149 9 11 
150—199 29 18 
200-249 54 32 
250 299^ 11 27 
300—349 5 13 


—————— M — 
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Which set of bags has the highest average bursting pressure? Which has more 
uniform pressure ? If prices are the same, when manufacturers’ bags would be preferred 
by the buyer? Why ? (B. Com., Delhi, 1966) 

So'ation. For determining which set of bags has the highest average bursting 
pressure calculate arithematic mean and for finding out which has more uniform 
pressure compute coefficient of variation. 


Manufacturer A : 
CALCULATION OF MEAN AND STANDARD DEVIATION 


( m—1745 ) 
5 
т а т! 


Bursting pressure 
(Ib.) f Ја" 
495—9 95 2 TAS 2 —4 8 
995—1495 9 1245 Si -9 9 
14:95—19: 95 29 1745 0 0 0 
1995- 24 95 54 22:45 1 54 54 
24:95—29:95 11 2745 2 22 44 
29:95 —34:95 5 32:45 3 15 45 
N=100 Dfd'=78 Efd!*=160 
X=A+ m Ke 
Here A-1746, Efd'=78, N=110, C=5 
d X =1745+ e x5=1745+3'55=21 
xfi» ү Xx ү ali “160 zi 78 y DENS 
з= а оа) A Eois: А, —-— = “i —0`. 
ni N N )xo 7A To \ lio )s У1455—0503х5 
== 4/0952 х5, —0:976x 5—4:88 
488 
Cha "i -23: 
y х100=— 7r x 100=23 24 


Manufacturer B : 
_ CALCULATION OF MEAN AND STANDARD DEVIATION 


т-— 17:45 
Bursting 5 
pressure f m d' fa! jan 
^. 495—995 9 TETAS —2 —18 36 
9:95—14 95 11 12:45 —1 —11 11 
14:95—19:95 18 17:45 0 0 0 
19:95—24:95 32 2245 1 32 32 
24.95—29'95 27 2745 2 54 108 
29:95—34:95 13 32:45 3 39 117 
N=110 Xfdi—96 2Ifd'—304 


EN 1 
Knap xc =1745+-16-х5=1745+44=2185 


om ud A Cu) Ai- 2900)? ТАЕ ce 
N N /x@ 110 10 / x5—2164—'162x 5 


—14149x5—7'0745 
с 70745 
Cy. * x100— TT LIT x 100=32'38% 
Since the average bursting pressure is higher for manufacturer B, hence the bags 
f manufacturer B have a higher bursting pressure. The bags of manufacturer A have 
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more uniform pressure since the coefficient of variation is less for manufacturer А, If 
prices are the same, the bags of manufacturer A should be preferred by the buyer 
because they have more uniform pressure. 

Illustration 23. Ап analysis of the monthly wages paid to workers in two firms, 
А and B, belonging to the same industry gives the following results : 


Firm A Firm B 
No. of wage earners 586 648 
Average monthly wage Rs 52:5 Rs 47:5 
Variance of the distribution of 
wages 100 121 


(a) Which firm A or В pays — llarger amount as monthly wages ? 
(6) Which firm A or B has greater variability in individual wages ? 


(c) Find the average monthly wage and the standard deviation of the wage of all 
the workers in two firms, 4 and B, taken together. 
(B. Com., Madras, 1972 ; I.C.W.A., 1973) 


Solution : 

(a) Firm A 

Total wage bill-586x525, (. EX=NX) | 
—Rs. 30,765. 

Firm B 


Total wage bill=€48 x 47:5—Rs. 30,780. 


Since total wage bill is greater for firm B, it pays larger amount as monthly 
wages. 


(b) Firm A 
в 10 
С.Ё.= а X100— 5275 x 100—19:05 
Firm B 


11 Я 
С. К=з x100—23:16. 


; 4, Since coefficient of variation is greater for firm B, it shows greater variability im 
individual wages, 


= NA+ NX, 
(с) Xu ЕЕС 
М№,=586, = 52:5, №--648, X,—47:5 
s (58652:5)--(648 х 47:5) _ 30,765-1-30,780 
Jk = 7 586-1648 = Ma Rs. 499 
ju X тот + CAES CAES Nyda? 
12 МАЕМ 


№1586, N,—648, 1—10, og=11 

dy (X, —X4,) - (52:5—49:9)— +26 

d,- (X,—X,,)— (47.5499) 74 

PER 4: 586(10)24-648(11)24-586(2:6,24-648(—2:4)3 _ 


586-648 
55,600--78,408--3,96036--3,732:48 — 14470184 ыча: 
=f 134 =y —ima — У 1073-1083 


Variance 


_ The term variance was used to describe the square of the standard 
deviation by R.A. Fisher in 1913. The concept of variance is highly 
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important in advanced work whereit is possible to split the total into 
several parts, each attributable to one of the factors causing variation in 
the original series, Variance is defined as follows : 


Variance= Е 


x Thus, variance is nothing but the square of the standard deviation. 
ien Variance=0° 
or c—^/ Variance 

In a frequency distribution where deviations are taken from assumed 
mean variance may directly be computed as follows : 


Variance= RS im e y} хо 


where d= сы) and C-common factor. 


Variance and Standard Deviation Compared 

Both the variance and the standard deviation are measures of 
variability in a population. These two measures are closely related as is 
clear from the formula Variance=c*. Variance is the average squared 
deviations from the arithmetic mean and standard deviation is the square 
root of the variance. In a subsequent chapter* the significance of variance 
analysis will be discussed at length. The smaller the value of o? the lesser 
the variability or greater the uniformity in the population. 

Illustration 24. The following table gives the marks obtained by a group of 80 
students in an examination. Calculate the variance. 


Marks obtained No. of students Marks obtained No. of students 
10—14 2 34—38 10 
14—18 4 38—42 8 
18—22 4 42—46 4 
22—26 8 46—50 6 
26—30 12 50—54 2 
30—34 16 54—58 4 

(B.A. Hons, Econ. Delhi, 1971) 

Solution : CALCULATION OF VARIANCE 
С 
Marks m f E Ж! fa? 
10—14 12 2 —5 —10 50 
14—18 16 4 —4 —16 64 Y 
18—22 20 4 —3 —12 36 
22—26 24 8 =2 —16 32 
26—30 28 12 —1 —12 12 
30—34 32 16 0 0 0 
34—38 36 10 1 10 10 
38- 42 40 8 2 16 32 
42—46 44 8 12 36 
46 50 48 6 4 24 [ 96 
50—54 52 2 5 10 50 
54—58 56 4 6 24 144 
N=80 Efd'=30 Efd!'?=562 


*Please refer to chapter on Analysis of Variance, 
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ы га" Ја! ү! z 
Variance | N (QT ) xe 
Dfd'?-=562, Sfd'=30, N=80, C=4 
Substituting the valves} 


Jef 562 217230 VARICES ie 
Variance={ $ 37) ]« = (702—014) x16 


+ 
=6'88 x 16=110'08. | 
| 
E 


Merits and Limitations of Standard Deviation 


Merits. (i) The standard deviation is the best measure of variation 
because of its mathematical characteristics. It is ba:ed on every item. of 
the distribution. Also it is amenable to algebraic treatment and is less 
affected by fluctuations of sampling than most other measures of dispersion, 


(ii) It is possible to calculate the combined standard deviation of | 
two or more groups. 'ТҺїз is not possible with any other measure. 


(їй) For comparing the variability of two or more distribution 
coefficients of variation is considered to he most appropriate and this is 
based on mean and standard deviation. 


(iv) Standard deviation is most prominently used in further statistical 
work. For example, in computing skewness, correlation, etc., use is made 
of standard deviation. It is а key-note in sampling and provides a unit of 
measurement for the normal distribution.* 


Limitations. (i) As compared to other measures it is difficult to 
compute. However, it does not reduce the importance of this measure 
because of high degree of accuracy of results it gives 


(ii) It gives more weight to extreme items and less to those which 
are near the mean. It is because of the fact that the squares of the 
deviations which are big in size would be proportionately greater than the 
Squares of those deviations which are comparatively small. The deviations 
2 and 8 are in the ratio of 1 : 4 but their squares, i.e , 4 and 64, would be 
the ratio of 1 : 16. 


Correcting Incorrect Values of Mean and Standard Deviation f 


Mistakes in calculations are always possible. Sometimes it so happens 
that while calculating mean and standard deviation we unconsciously copy 
out wrong items. For example, an item 21 may be copied as 12. Simi- 

, larly one item 127 may be taken as only 27. In such cases if the entire 
calculations are done again, it would become too tedious a task. By 
adopting a very simple procedure we can correct the incorrect values of 
mean and standard deviation. For obtaining correct mean we find out 
correct 2X by deducting from the original 2X the wrong items and adding 
to it the correct items. Similarly for calculating correct standard deviation 
we obtain the value of correct ZX?, The following illustrations shall 
clarify the calculations. 


Illustration 25. (a) From a frequency distribution consisting of 18 
observations the mean and the standard deviation were found to be 7 and 4 respectively. 


i 


*For details please refer to chapter on ‘Theoretical Distributions’. 
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But on comparing with the original data it was found that a figure 12 was miscopied 
as 2] in calculations. Calculate the correct mean and standard deviation. j 
Q.C.W.A. 1972) 


Solution. Calculation of correct mean 
x- 2Х,мӮ-ух, N-18, 27 
УХ=18х7=125 


But this is incorrect УХ. 
Correct ZX—126—21--12—117 
УХ 117 


Correct meane у= -65 


Calculation of standard deviation 


Squaring 16-27. —49 
ZX 249416 


3X3—65x18—1,170. But this is incorrect ХХ". 
Correct EX? —1170—(21)*4 (12)!—1170—441 -144—873 


*. Correct standard deviation 
_ [correct SX? _ CRT 4 37 
- JT DN -correct x) As бз 


= 185 42 = V6 25=2'5 


Illustration 25. (b) The mean and standard deviation of a set of 100 observa- 
tions were worked out as 40 and 5 respectively by a computer who by mistake took the 
value 50 in place of 40 for one observation. Recalculate the correct mean and 
variance. (B. A. Hons. Econ., Delhi 1969) 


Solution. Correct mean Y=, Ny-3X, N=100, ¥=40. 
ZX—100x40—4,000 
But this is not the correct EX because one item has been taken as 50 instead 


of 40. 
+. : Correct ZX—4000—50-4-40—3,990 
3990 у 
Correct Mean=— 00 39°90 


Correct variance 


А EX? у= 
Variance= > (х)? 


Variance=o?=(5)*=25, N=100 


2500— ZX?—160000 
2X1—160000--2500— 162500 
Correct 2X3—162500—(50)* + (40)*—162500—2 500 + 1600161600 


_Correct УХ? —(Correct x) 


Correct variance= N 


/ 
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161600 А 161600—159201 _ 2399 „n, 
=—100 — (39:9), тт 7-100 23:99 


Thus correct теап=39:9 and correct variance— 23:99, 


Illustration 26. The number of employees, wages per employee and the 
variance of the wage per employee for two factories are given below : 


Factory A Factory B 
Number of employees 50 100 
Average wages per employee 
per month (Rs.) 120 85 
Variance of the wages per ] 
employee per month (Rs.) 9 16 


(а) In which factory is there greater variation in the distribution of wages per 
employee ? é 


(b) Suppose in factory B, the wages of an employee were wrongly noted as 
Rs. 120 instead of Rs. 100, what would be the correct variance for factory B? 
(B.Com., Del*i, 1969) 
Solution : 
(a) Variation in the Distribution of Wages 


Factory A Factory B 
Coefficient of variation or 


C.V.— = x100 C.V.—-9 x 100 
x Г ¥ 
с=%ү9=3 Се ^/16—4 
120 X-85 
3 "Ез à oid =r 
СУ.=-гуу Х100=25 EUM су 3; Х100=47 


The coefficient of voriation is greater for factory B, hence there is greater variation 
in the distribution of wages per employee in factory В. 


(b) Correct Variance. For finding correct variance we have first to find the 
correct mean. : 


zx 
x= TWO NX=2X 


УХ=100х 85—8500 
Correct ZX—8500—120--100—8480 
Variance=o? 
rx? УРЕ а 
1.247 УХ ү IEX о, 
Бат: ( N ) Too - Q0 
Substituting the values of c?, X; etc., 


IX 
16-15 (85)° 
1600— 2X?—722500 


ZX*—724100 
But in this total 100 has been taken as 120, 
Correct %X*=724100—(120)?-4 (100)*—724100—14400-1-10000—719700. 
5 ; Correct УХ? “ 
27. Correct variance ax DIT "E zx —(Correct y)? 


7197 
= 09000 — 84:8)—7191—7191.04— 5:96, 


p 
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Illustration 27. The mean of 5 ions is 4" i i 
three of the five observations are 1,2 meme Pee актана on 


Ы 1 (М.А. Econ., Delhi, 1967). 
УХ 
X-—N 
M ZX-NX 
Here №5 ; Х=44; Х=5х44=22 
Let the two missing items be x, and Xs 
1+2+6+х1+х,=22. 
ог X1tx3—22—9 
or xibx,—13 Үү, 


зё > 
с2— TN = Хх» 
з 
#20 2X" ay 
4U2-2X?—19:36x 5 
2X3 —96:804-41:20—138 
Узе xy? + x, P4 114-21 68 — x? S? 41 
x114-3,2—138—41—97 
(хі+х)2= 12х32 + xixa 
(13)*—97--2x,x, ог 160—97--2x,x, 
2x,x3—169—97 or хух,=36 


хү+ха= 13 (Ò 
(хү—х)#==хү*+х,%—2хух,=97—2 (36)=25 
Xı—x=5 in 
Xxytxg13 EO) 
А Q0 X07 X$4—5 
Adding ELTE (i) 
D x29 


Putting the values of x, in equation (i) 
9+х,=13 ate Xg=4 


Thus the two missing values are x,—9, x,—4. 
V. LORENZ CURVE 


The Lorenz Curve, devised by Max О. Lorenz, a famous economic 
Statistician, is a graphic method of studying dispersion. This curve was. 
used by him for the first time to measure the distribution of wealth andi 
income. Now the curve is also used to study the distribution of profits, 
wages, turnover, etc. However, still the most common use of this curve 
is in the study of the degree of inequality in the distribution of income and 
wealth between countries or between different periods of time. Itisa 
cumulative percentage curve in which the percentage of items is combined: 
with the percentage of other things as wealth, profits, turnover, etc. 


While drawing the Lorenz Curve the following procedure is adopted : 


(i) The size of items and frequencies аге both cumulated and then. 
percentages are obtained for these various cumulative values. 


D 
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(ii) On the X-axis start from 0 to 100 and take the per cent of 
‘cumulative frequencies, 


(iii) On the Y-axis start from 0 to 100 and take the per cent of 
variable. 


(iv) Draw a diagonal line joining 0 with 100. This is known as line 
of equal distribution. Any point on this diagonal shows the same per cent 
‘on X as on Y. 


(v) Plot the various points corresponding to X and Y and join them. 
‘The distribution so obtained, unless it is exactly equal, will always curve 
below the diagonal line. If two curves of distribution are shown on the 
same Lorenz presentation, the curve that is farthest from the diagonal line 
represents the greater inequality. Clearly the line of actual distribution 
‘can never cross the line of equal distribution. 


Illustration 28. In the table below is given the number of companies belong- 
ing to two areas 4 and Baccording to the amount of profits earned by them. Draw in 
the same diagram their Lorenz curves and interpret them. 


Profits earned in No. of Companies 

Rs. '000 Area A Area B 

6 6 2 

25 11 38 

60 13 35 

84 14 28 

105 15 38 

150 17 26 

170 10 12 

400 14 4 


(.C.W.A., 1974) 


Solution ; CALCULATIONS FOR DRAWING THE LORENZ CURVE 


Profits Arca A Area B 
Б = | © MAREC ES 
© S] * * 
na S = 
= ta a: ЖАШ. Ч E v =з = 
A = S p S 2 vy 
ss ыр |e | Seles | ELS | 28 
4 3 E] S * * S 3 NE 
BoP] Be borgo ERE Sih ГЕ 
© N SE Sais OR S ó* és 
à © z 2 
han hoe! ——— | 
6 6 06 6 6 6 2 2 1 
25 31 31 11 17 17 38 40 20 
60 91 91 13 30 30 52 92 46 
84 175 | 175 14 44 m 28 120 
105 280 280 15 59 59 38 158 79 
150 430 430 17 76 76 26 184 92 
170 600 60:0 10 86 86 12 196 98 
400 1000 100:0 14 100 100 4 200 100 
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LORENZ CURVE 


бу 
eo 


E 
o 


a 
© 


A 
$8 


p 
V 
N 
x 
$ 
x 
8 
х 
E: 


w 


—— э == 
РЧ 
e 


E Jiti? 
0 20 30 40 50 60 70 B0 90 wo 
PER CENT OF COMPANIES 


Since curve B is farthest from the diagonal line it represents greater inequality. 
MISCELLANEOUS ILLUSTRATIONS 
Illustration 29, From’ the following data, calculate quartile deviation and its, 


coefficient : 
Class Frequency Class Frequency 
10—20 3 40—50 10 
20—30 5 50—60 4 
30—40 15 60—70 2 
Also calculate the coefficient of variation. (B. Com., Marathwada, 1973) 
Solution : Ў 
COMPUTATION OF QUARTILE DEVIATION AND ITS COEFFICIENT 
-35 
Class M.P. ( "3 
10—20 3 15 —2 — 6 12 3 | 
20 -30 5 25 —1 —5 5 8 
30—40 15 35 0 0 0 23 
40—50 10 45 1 10 10 33 
50—60 4 55 2 8 16 37 
60—70 2 65 3 6 18 39 
(cat Or 


Е ; 
Qi-Sim of Mth неше e 9751 item, 
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Q lies in the class 30—40 


F NJA -c.f. 


Аа 


L=30, 9°15, c.f. 8, f=15, 1-10 


хі 


9 md 


E Q1-304- x10=304 15 391 1 maar 


Qs Size of Vth itemj= - EX? -295sth item 


«Оз lies in 'the-olass 40—50. 


L—40,3N 29-25, с. f.=23, f=10, 110 


А 29:25—23 
S Q,—40 + 10 


X 1040-62 2246:25 


46:25—31°17__ 15:08 
Q.D, —— —— E 


2 zT54 


á .Q- 0i. 4625—3117 15:08 
Сют. OD- GFO EBREN TOS 


Си 100 


X= A+ I 


xc 
A=35, а=, N=39, C=10 
Masi E у Х10=35-+3- 33—38:33 


| (2) xc 


У /1%=61, Z fd'=13, N=39, C=10 
Ms 3 3 y Хх10=+/1-564—:111х10 
= 1453 x 10=1'206 x 10=12-06 
12:06 
33833 
Illustration 30. Using the following data, find the percentage of cases that-lie 
yutside the limits indicated by X--c, 4-25, 4-30. 


145, 141, 116, 96, 91, 91, 87, 89, 91, 102, 95, 108, 120 and 139, 
148, 145, 141, (M.A. Jabalpur, 1974) 


С.Ё.= x100— 31:46, 
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Solution : CALCULATION OF MEAN AND STANDARD DEVIHTION 


x (X—110) x X— 110, 
d а? : d 4 а 
148 38 1,444 89 -2 
145 35 1,225 91 —19 361 
141 31 961 102 — 8 64 
116 +6 36 95 —15 225 
06 ZH "m 108 —2 4 
=- 1 120 +10 100 
91 —19 361 139 29 
87 -23 529 x Ыы 
ZX-—1659 Zd—9  24?—7149 
y 2X 1659 
Y= m mcm 1106 


A rA) 1205) 
-4/3166—36 =V 41624 21:82 
Х-Е1в=110°6--21°82 or 88:78—132:42 
X4:2e—110:6-L-21:82x 2 or 66:96—15424 


Х4-30= 110:64-21"82 x3 or 45:14—176:06 


Number of items that are below 88°78 and above 1324255 
4. Percentage of cases that lie outside the limits indicated by 


Х+о= s x100=333% 
Since the lowest item in the series is 87 and highest 148, no item lies outside 


X-E2e and X4-3c. 
Illustration [53 Calculate the coefficient of dispersion from the following series : 
Size Frequency Size Frequency 

2 3 8 10 
3 8 9 8 
4 10 10 17 
5 12 il 5 
6 16 12 4 
7 14 


К Solution. Coefficient¥of dispersion is any relative measure of finding out 
dispersion. Since standard deviation is considered to be the best measure of dispersion, 
we compute coefficient of standard deviation. 

‚ CALCULATION OF COEFFICIENT OF DISPERSION 


Size f d fd fan 
2 3 -5 —15 75 
3 8 —4 —32 128 
4 10 —3- —30 90 
5 12 —2 —24 48 
6 16 —1 —16 16 
7 14 0 0 0 
8 10 1 10 10 
9 8 2 16 32 

10 17 3 51 153 

11 5 4 20 80 

12 4 5 20 100 

М=107 ZEfi-0 542—732 7 
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=7, Zfi—0, N=107 
o 
5 Х=7+т07 =7 


ууйз=732, N=107, Zfd—0 


e A: 732 2-616, 
107 


Iilestration 32. From the following observations prepare a frequency distribu- 
tion table in ascending order starting with 100—110 (exclusive method) and find out the 
coefficient of variation : 

125 108 112 126 110 132 136 130 149 155 

120 130 136 138 125 111 119 125 140 148 

147 137 145 150 142 135 137 132 165 154 

(B. Com., Karnatak, 1967) 


Solution : FORMATION OF FREQUENCY DISTRIBUTION AND 
CALCULATION OF COEFFICIENT OF VARIATION 


Class Tally Bars Frequency mos 

s Zo. d! fal fan 

100—110 Ц 1 105 —3 —3 9 
110—120 nm, 4 115 —2 —8 16 
120—130 ит 5 125 —1 —5 5 
130—140 ит on 10 135 0 0 0 
140 —150 amor 6 145 1 6 6 
150—160 " 3 155 2 6 12 
160—170 i 1 165 3 3 9 
N=30 Zfjdi=—1 Zfd!?=57 


CY. x 100 


_ БУЛЫШ ЛД 
o (A) xe 
Zfd'1—57, fd'——1, N=30, C=10 


СУДЫ ЖЕ Ик © 
= E (=r) x 10—4/19—0001 x 10221:385«10—13:8. 


X-AL NT xe 


A=135, Zfd'— —1, N=30, C=10 
1 
X=135 —39 X 10= 135 —0°33= 134-67 


1 
cv 18. 5100-1025 


Tilustration 33. An association doing charity work i i 
Pensions to people Over sixty years Of age, Р BY Par PSA Se d 


e scales of penisons were fixed as follows : 


Age group 60 to 65 — Rs. 25 per month. 
Age group 65 to 70—Rs. 30 per month. 
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Age group 70 to 75—Rs. 35 per month. 

Age group 75 to 80—Rs. 40 per month. 

Age group 80 to 85—Rs. 45 per month. 

The ages of 25 persons who secured the pension right are given below.  Calzulate 
the monthly average pension payable and the Standard Deviation : 


74 7 62 84 57200161 зз 7 81 641 eT 63 61 60 


67. 741166 САТО 2073.00 750.0076 069 1168 08178 67 
(C.A., May, 1969) 
Solution : CLASSIFYING T'iE ABOVE DATA 
Age group Tally Bars Frequency 

60 to 65 mou 2 

65 10 70 t 5 

70 to 75 s 6 

75 to 80 un 4 

80 to 85 iii 3 
Total 25 


CALCULATION OF MONTHLY AVERAGE PENSION PAYABLE 
AND THE STANDARD DEVIATION 


Scale of pension ( (Х—35) ) 
Кз. 5 
oio w fa qn 
25 1 =2 —14 28 
30 5 —i — 5 
35 6 0 0 0 
40 4 1 4 4 
45 3 2 6 12 
—25 зуй'——9 <  ifi't-49 
T Zfd 
X=A+ N xC 


A=35, 3fd'=—2,N=25, C=} 
= 9 ie 
А 435 2 x § =35—1°8=33'2 
& X=35 25 x5235-1 
к In p РЫ 
zu igne EAN, 
e e 
zfd'2—49, Zf1!——9, N=25, C=5, 
Substituting the values, we have 
om af Se (x) x5= y T0 Ix 52135x 5265 
25 2S 


Thus the monthly average pension i Rs. 332 and standard deviation Rs. 6°75. 
| HU Чоп 34. The first of two sub-groups has 100 items with mean 15 and 
standard d MUR 3. If the whole gro ip has 250 items with mean 15°6 and standard 
! iati 7 í d jation of the second sub-group. 
deviation A/ 13°44, find the standard deviation "3 Шр ion. Delhi, 1969) 
бобов. N,=100, X,—15, 1=3 
à NEN 7250. 2547 15, enm V 134 
Since N,-- №250, N,— (250—102) 159» 


X X; 50. 
М+М Хь ог 1862100515150 


Хз= Nt Ма 


SME—ro 77:17 


3900—1,500--150X, 
1503,—2,400 ог X,—16 


PET NUN, 


d, (X;—X33)- (15—15:6)— —6 
dy- (5: —333) ^ (16—15:6)— +04 


2... Ni - Nos - Nid? + N d 
=ч Лаа ENIO EN dS 
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1344— 100(3)?+-1500,2-+ 100(—0'6)2-1- 150(0:4)2 


250 
900--1506,?--36--24 
ITA T 


1506,2—3,360 —960—2400 
9,'—16 or сз=4 ^ 


3,360— 


" 


The standard deviation of the second sub-group is 4, 


Mlustration 35. Compute 


standard deviation of the dis 


tance travelled by 
260 farmers to buy certain daily necessities, 
Km. travelled 
(mid-yalues) 1 3 ЖКК ТОЙ fi 13 15 17 19 
Number of Farmers: 19 52 70 39 ЖКД 4:717 8 1 
(В. А. Madural, 1970). 
Solution, CALCULATION OF STANDARD DEVIATION 
Km. travelled No. of m—IT 
тшн farmers ( z fal fan 
1 19 —5 — 95 475 
3 52 -4 —208 832 
5 70 -3 —210 630 
m 39 -2 — 78 156 
9 24 —1 — 24 24 
п 21 0 ө 0 
13 14 1 14 14 
15 12 32 24 48 
nmi vd 8 +3 24 72 
19 1 +4 4 16 
ee 
N=260 Zfd!— —549 F fd!*=2,267 


ram ЕО СИСА 


on, | Iff? уу 
A ib (3Eyxe 


Zfd'*=2,267, Efd'=—549, N=260, C=2 


ате 60 


E nea, 
oma] 252 с=з 
260 260 


X27 7797462 


=V 426 x2=2'064% 24128 


standard deviation of normal distribution 


and 5 respectively, Find the inter. quartile range and the mean deviation of the 
і atribytion, 


Illustration 36. (а) The mean and 


Solution, Given 60, с=5. 


We have to find out the inter-quartile range and the mean deviation. 


MD с = Ex 


524 


Q.D. i c 5-10 
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Q,-Qi. 10. 
2 3 


0,-0,- 2. =667 
Hence inter-quartile range—6:67 
(b) Mean and standard deviation of two distributions of 100 and 150 items m 


2 5 and 40, 6 respectively. Find the mean and standard deviation of „all 250 item: 
en together. i (M. Com., Delhi, 1970) 


Solution. Combined mean 
= М+М 
Nı+N, 
N,=100, N,—150, X,—50, X,—40 
gu, — (100X50--0150x40) _ 5,000--6,000 
a= 1004-150 25073577 
Combined standard deviation 
NLN SLN ASLNAS 
NUN. 


EC 


44 


ou = 


N12100, 0,—5, Ny=150, g1—6 
di- QG — X1) (50—44)-6 
d,-(X, —34,)5 (40—44)— —4 

o, — V/ 100 --150(6) 3 X006) --150(—4)* 


100-1 
= 4/2500 45400 +3600 +2400 
250 
713900... zx 7, 
mA 556 =7 46. 


Illustration 37. The figures below show the amount spent on entertaining custo« 
mers by each of a firm's 92 salesmen in a given month : 
No. of salesmen 


Amount Spent 
(Rs.) 

Less than 2 5 
2]or more but less than 4 6 
4 or more but less than 6 8 
6 or more but less than 8 14 
8or more but less than 10 21 
10 ог more but less than 12 12 
12 or more but less than 14 9 
14 or more but less than 16 9 
16 or more but less than 18 6 
2 


18 or more but less than 20 


' i tic mean, standard deviation and coefficient of variation. 
Calculate the arithme: Qu BA. | Dell, Torn 
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Solution : CALCULATION OF MEAN, STANDARD DEVIATION 
COEFFICIENT OF VARIATION 


Amount Spent m.p. л) 
(Rs.) m ( 2 
7 d' fa! fa^ 
0—2 5 1 +3 —25 125 
2—4 6 3 —4 —24 96 
4— 6 8 5 —3 E 72 
6— 8 14 ji — —28 56 
8—'0 2t 9 —1 — 21 21 
10—12 12 14 0 0 0 
12—14 9 13 Tl +9 9 
14 16 9 15 Fa +18 36 
16—18 6 17 F3 +18 54 
18—20 2 19 +4 T8 32 
N=92 Zjd'=—69 /4%=501 
zt П 
х=я+ Xf 
А= И, Efd!— —69, №92, C=2 
ў. 69 ORE Хе, 
X-1- EX x2211—15-9:5 
Hence average amount spent=Rs. 9'5 
b zuo 
e E QE 
501 69 \2 rg oe 
=, | E (эу ) X2—4/3:446—0:5605x 2—2:21 x 24:42 


4°42 
ys 100—46:5395 


су.= £ x100- 
x 


____ Illustration 38. Mean and standard deviation of the followin, i 
pee ae 31 and 15'9 respectively. The distribution after taking step SUME TUER 


dx -3 =2 =A 0 
f 10 15 25 25 10 10 3 
. Determine the actual c'ass intervals, (B. Com., Delhi, 1974) 
> , 


Solution. Тп order to ascertain the class р, 
^ e 3 £ro :рѕ we need two 0; 
interval and the assumed mean. „From the formula for finding at Ана deri pes 
we can determine the class interval and the formula for calculati eun 
determine the assumed meen. eae ee она 


COMPUTATIONS FOR DETERMINING CLASS GROUPS 


4 f fa! fan 

—3 10 —30 
—2 15 —30 Pi 
—1 25 —26 25 
0 D 0 0 

1 

2 10 E di 
3 5 45 
Efd'—-—4 5/а%=270 


ТЕУ Тыын Er RAD A 


| 
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в 


5/йз=2170, Efd'=—40, N=100 and с=15:9 


210, (= 
15:9=,41 100 \ 100 


1:59С=15'9 
I9 
С=-759-=10 
1. 
xeu C 


з 
) XC-4/27—16xC — V T39xC 


Иш АС 18 


10) 
4—4=31 
4=31+4=35 


Hence assumed mean from which deviations have been taken=35 and class 
interval is 10. The lower and upper limits of this class would be 30 and 40. This class 
Will correspond to zero (0) in the question given. The class preceding to this would be 

0 and the class succeeding to this 40—50 and likewise we get other classes, Thus 
the actual class groups will be as follows : 


Class group Frequency Class group Frequency 
0—10 10 40—50 10 
10—20 15 50—60 10 
20—30 25 60—70 5 
30—40 25 


Illustration 39. You аге incharge of rationing in a state affected by food short- 
age. The following report arrive from your local investigators ; 


Daily caloric value of food available per adult during current period : 


Area Mean Standard Deviation 
5 2,500 500 
d 2,200 300 


The estimated requirement of an adult is taken at 3,000 calories daily and the 
absolute minimum at 1250. Comment on the reported figure and determine which area 
in your opinion needs more urgent attention. 


А Solution. Ina population Pe +30 covers 99:737; i.e., almost all cases. The 
limits on the basis of the information given to us would be : 


Area X: — X-E3e22,5002-3x 500— 1,000 —4,000 
Area Y:  X+36=2,200-+-3 x 300—13,00—3.100 


It is clear from these limits that in area X there are some people who do_ not get 
even 1,250 calories which are regarded as bare minimum whereas in area Y everybody is 
Betting above the minimum, Hence area X needs more urgent attention. 

Illustration 40. A collar manufacturer is considering the production of a new 
style of coilar to attract young men. The following statistics of neck circumferences 
are available based upon measurements of a typical group of college students : 


(n inches 40 145 150 155: 160 
in inches) 12:0. 125 130 135 40 14 ; у j 
No of students 2 16 36 60 76 37 18 3 2 


iati йепоп X. is the 

Compute the standard deviation fand use the criterion X+-3c where c is 
Standard deviation and X is the arithmetic mean to determine the largest and Ns 
BiZe Of the collar he should make in order to meet the needs of practically all о 
Sisomers bearing in mind that collars are worn, on averege $ inch longer than пес 
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Solution. | CALCULATION ОЕ MEAN AND STANDARD DEVIATION 


Mid-value No. of ( т—14 ) 

in inche: Students Os 

n inches F d' fa’ ja 
12:0 2 —4 — 8 32 
12:5 16 —3 —48 144 
13-0 36 —2 —72 = 144 
13:5 60 —1 —60 60 
140 76 0 0 0 
14:5 37 T 37 37 
150 18 2 36 72 
155 3 3 9 27 
160 2 4 8 32 

N=250 Zfd'——9g8 Zfd'1—548 
Had! „с 


4-14, Zfd!—— 98, N=250, C=0°5 
98 
X= X'S =13-8 


fut (IY! о. 7-58 5 /2:192—0 154% 9:5 
Ге) хс, | 38 -( x0:5—4/2:192 x0 


250 


== 2038x0:5—143 x 0:5—0:715, 
Largest and smallest neck size 
=X130= 158--3(0715)—13:8--2:145—11:65.— 15:945, 


Since collars are worn on an average 1 inch longer than the neck Size, we should 
‚ add 0°5 to these limits. Thus the smallest and largest sizes of collar should be : 


(11°655+-0°5) and (15:945--0:5) «12:155 and 16:445 
r Thus the smallest size of collar should be 12-2 inches long and largest 16:4 inches 
long. 


Illustration 41, Find the standard [deviation and Coefficient of variation from 
the following data S 
Interval Frequency Interyal Frequence: 
3:00—3:25 6 4:00—4:25 47 4 
3:25—3:50 19 4°25—4-50 29 
3:50—3:75 35 4:50—4:75 15 
3'75—4:00 44 475—500 5 


(В.А. Hons. Econ. Delhi, 1972) 
Solution, CALCULATION OF STANDARD DEVIATION AND 


COEFFICIENT OF VARIATION 
Interval £f m.p. ( m-—3:875 


m i ОЗИ _ 

а fa! fa" 
3:00—3:25 6 3:125 —3 —18 54 
325—350 19 3:375 Hi —38 76 
3:50—3-75 35 3:625 =l —35 35 
3°75—4:00 44 3:875 0 0 0 
4:00—4:25 47 4125 1 47 47 
425—450 29 4:375 2 - 58 116 
450—475 15 4625 3 45 135 
475—5:00 5 4:875 4 420 80 


N=200 2fd'=+79 Efd'2=543 
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Calculation of Standard Deviation 


I X? (sd yi ITY: a 


Zfd/*—543, Bf! =+79,N=200, C=0-25 
E 543 -(sw c у х025=у/2°715—0°156х0-25 


=+\/2`559х0°25=1°599 х0°25=0°39975 or 0:4 
Calculation of Coefficient of Variation 


CV. —— x100 


X 
& Х=4+ Ed хс 


A-3875, eee +79, N=200, C—0:25 
79 


X-387 + 0 x0:25—3:8754-0099— 3:974 


= 08 = 
C.V.= х0 %100=1007%. 


Mlustration 42. The arithmetic mean and standard deviation of a saries of 20 items 
were calculated by a student as 20 cM. and 5 cm. respectively. But while calculating 
them an item 13 was misread as 30. Find the correct arithmetic mean and standard 


deviation, (B. Com., Delhi 1972), 
Solution : Calculation of Correct Mean 
y= zx or XNeZX 
N=20, y —20 
^ УХ=20х20=400 
Correct 2X—400—304-13—383 
Correct х= 39-1915 
Calculation of Correct Standard Deis 
8-50 (gy 
= —(20)? 
25x20=2X?—400 x20 
XX:—8500 


Correct 3X2— 8500 — (30)* 4-(13)*—8500 —900 -- 169:— 7769. 
Correct o= OE. —(Correct X)? 


Е; E uc —(1915)?—547 38845-36672 —4/2173—4:66 


Thus the correct mean is 12:15 and correct standard deviation 5:66. 
Illustration 43. Find the missing information from the following : 
Group I Group II Group III Combined 
` N umber 50 ? 90 . 200 
tandard Deviation 6 7 ? 7:746 
ean 113 ? 115 116 
(B. Com., Delhi, 1973) 
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Solution : 
Finding Number of Observations in the Second Group 
ў та №, №, № denote number of observation in the Ist, 2nd, 3rd group res- 
vida We are given Ni+N,+N,=200 
N1—50, N,—90 ~. NUEN—140 
N3—200—140—60 


MEASURES OF VARIATION | 
1 


Finding Mean of Second Group 
Let Y, Žo X, denote mean of first, second and third group respectively. 
Xu SCENES ENS Xs 
М.+№ М 
= 16, № +№,+№,-200, X,-15, 15 
We have to find Ха 
Substituting the given values 


11630113) 603) .90(115) 
200 


116x:200—5650-1-60X, 4-10350 
60.Y,—23200—16000—7200 =, — X120 
Finding Standard Deviation of 3rd Group. 
ew Ма № Мз EN di EN dS EN dS 
NENENS 

912377746, N1—50,2,—6, N,—60, 9,757, №=90 
а= (X,— X29) =(113-116) = -3 
а= (а) (120-116) —4 
а= (за) = (115-1169 —1 

Substituting the values, " 


"746 50(6)?-+ G0CT* 908,3 50(—3)3--60(4)-E90(—T . 
17746 { PRIS ao Re com) 


J| 1800-2940 903,31 450-960-490 
=, | — ——— +900245096090 
200 


6 240 -90g,3 
ТАЛУУ ОУ cipes 


(7.746): S2 t a:t 
12000— 6240--906,* 
9055*—12000— 6240 
с32—64 
ог age 648 
Thus the missing values аге 
N2=60, X,=120, ;=8 
Illustration 44. Fora group containing 100 Observations, the arithmetic mean 


and standard deviation are 8 and 4/1075. For 50 {observations selected from these 100 
Observations the mean and the standard deviation are 10 and 2 respectively, Find the 
arithmetic mean and the standard deviation of the other haif. 


(2,4. Hons, Econ., Delhi, 1973) 


"Squaring 
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Solution. 
Ni N7100, 2—8, o= V 105 

N1—50, X,—10, «1-2 
We have to find the mean and standard deviation of the other half, 
AU NOE ЕЛА е 


Xn 
NUN. 
ES 50104-50, 
КТ 100. 


800—500--50X, 
505—300 
ý Х»=6 


gis Nyoi*4- N,0,? - Nd *- Nds? 
s NUN. 


dj (X, — X3)—10—8—2 
d,— (X3 —43)-6—-8- -2 


j 50(2)2-- 506,2 -- 50(2)? -: 50(—2)2 
lum OUR eu EIQUE lap 


10:5 x 100=50%4+500,?+200+200 
50c,2= 1050—600 


wa o,2=9 
or o,=3. 
Hence the mean of the remaining 50 items is 8 and the standard deviation 3, 


able organisations decided to give old age pensions to 
The scale of pensions were fixed as follows : 


Rs, 20 per month 
25 


Illustration 45. А charit: 
people over sixty years of age. 


Age group 60—65 


m 9565—70 » 0» 
uc ga OD 30, „ 
оа 15-80 е 

80—85 43054 


» » 


The ages of 25 persons who secured the pension right are as given below : 


74 62 84 72 61 83 72 81 64 7A 63 61 60 

67 74 64 79 73 75 76 69 68 78 66 67 
‚ Calculate the monthly average pension payable per ‘person and the standard 
deviation. (B. Com, Delhi, 1975) 


Solution. For finding out the average pension payable and the standard 
deviation. Let us first prepare the age distribution. 


Age (year) t Tally bars Frequency 
60 - 65 ио 7 
65—70 шт 5 
70—75 ин. 6 
75—80 пи 4 
80—85 (I 3 


Total 25 
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CALCULATION OF MEAN AND STANDARD DEVIATION 
Rat f AX— 30)/5 
е ы ушка р A ) gn ж" 
20 7 —2 -14 28 
25 5 —1 -5 5 
30 6 0 0 0 
35 4 1 4 4 
40 3 2 6 12 
Lo rame мз ишы RERUM. MRNA 
N=25 Zfa'-—9 Efd'*—49 
-[zu (Xu cle -* 
$ [87-06 КА 25 7 (7-) 5 


=y 1'96—"13x5=1°353 x 5— 6:765 


Thus the monthly average pension payable {per person is Rs. 28:2 and 
standard deviation Rs. 6°76. ante 


Illustration 46. You are given the following distribution of monthly income- 
Per family. Calculate the mean (Y) and standard deviation (5) for this distribution, 
Monthly income No. of families Monthly income No. of families 


(Rs.) (Rs.) 
100—120 30 200—240 15 
120—160 i 25 240—280 10 
160—200 20 


What percentage of the families falls in the interval X —$ and X4s ? 
(B.A. Hons. Econ. Delhi, 1975) 


Solution: CALCULATION OF X AND 5 (STANDARD DEVIATION) 


Monthly income ^ Mid-points (m— 180)/10 

(Rs.) m d' fa’ Ја? 
100—120 110 30 —7 —210 1,470 
120—160 140 25 —4 —100 400 
160—200 180 20 0 0 0 
200—240 220 15 4 60 240 
240—280 260 10 -8 80 640 
N=100 Zfd'——170 Zfa5—2150 

v Dfa’ 

X=4+ SNS 

170 

=180— Toy X 10=180—17=163 
ЭШЕ ea Гуф ao 
$ = { (ee d Sa 
a у -US xe 8 (© ) x10 


=V 275-289 x 10=+/2F61 x 10—4:96 x 10=49°6 
+516349 6=212°6 
Families falling in the interval X 45,30 t254-204-4:725* 79:725 


* Within 200—240 there are 15 families, So families Covered up to.212:6: 
=} x126-4725. - 


= 4 
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2. Percentage of families falling in the interval 
2. +S=79°725 per cent 
X—-s-16—49:6-1134 


Families falling in the interval #—5=-Э6 x134-201. 


Thus the percentage of families falling in the interval X— $—20'1 per cent. 


Which Measure of Dispersion to Use ? 


Unlike measures of central value, in case of measures of variation 
also the choice of a suitable measure depends on the following threo 
factors : 

1. The type of data available. If they are few in numbers, or 
contain extreme values, avoid the standard deviation. If they аге 
generally skewed, avoid the mean deviation as well. If they have gaps 
around the quartiles, the quartile deviation shoutd be avoided. If there 
are open-end classes, the quartile measure of dispersion should be preferred. 


2. The purpose of investigation. In an elementary treatment of 
statistical series in which a measure of variability is desired only for itself, 
any of the three measures, namely, range, quartile deviation and average 
deviation, would be acceptable. Probably the average deviation would be 
superior. However, in usual practice, the measure of variability is em- 
ployed in future statistical analysis. For such a purpose, the standard 
deviation by far is the most popularly used. Itis free from those defects 
with which other measures suffer. It lends itself to the analysis of 
variability in terms of normal curve of error.* Practically all advanced 
statistical methods deal with variability and centre around the standard 
deviation. Hence unless the circumstances warrant the use of any other 
measure, we should make use of standard deviation for measuring 


variability. 
SUGGESTED READINGS 


Chou + Statistical Analysis, СЬ. 4. 
Croxton and Cowden : Applied General Statistics, Ch. 10. 
Neiswanger : Elementary Statistical Methods, Ch. 10. 


Simson and Kafka : Basic Statistics, Ch. 13. 
Wessel and Willett : Statistics as Applied to Economics and Business, Ch. 5. 


* Please refer to Chapter on Theoretical Distribution. 
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———————————————————————————— 


SKEWNESS 


In the previous two chapters we have discussed measures of сана 
tendency and variability. However, they do not reveal the entire s 2 
There § are two other comparable characteristics called skewness a 
kurtosis that help us to understand a distribution. Two distributions may 
have the same mean and standard deviation but may differ widely in 
their,overall appearance as can be seen from the following : 


YMMETRICAL DISTRIBUTION 


ASYMMETRICAL DISTRIBUTION 


0 5 W 15 20 25 30 


In both these distributions the value of mean and standard deviation 
is the same (X— 15, o=6). Butit does not imply that the distributions 
are alike in nature. The distribution on the left-hand side is symmetrical 
one whereas the distribution оп the right-hand side is asymmetrical or 
skewed. | Measures of skewness help us to distinguish between different 
types of distributions. 


The term ‘skewness’ refers to lack of symmetry, i.e., when a distribu- 
tion is not symmetrical (or is asymmetrical) it is called a skewed distribu« 
tion. Any measure of skewness indicates the difference between the 
manner in which items are distributed in a particular distribution 
compared with a symmetrical (or normal) distribution. 1f, for example, 
skewness is positive, the frequencies in the distribution are spread out over 
а greater range of values on the high-value end of the curve (the rights 
hand side) than they are on the low-value end. 1f the curve is normal, 
spread will be the same on both sides of the centre point and the mean, 
median and mode will all have the same value. The concept of. skewness 
gains importance from the fact that statistical theory is often based upon 
the assumption of the normal distribution. A measure of skewness is, 
therefore, necessary in order to guard against the consequence of this 
assumption. 

The concept of skewness will be clear from the following example : 

Let us take the following three distributions : 
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Distribution A Distribution B Distribution C. 
x JI WE x ng ГОР З x X 
1 2 2 1 6 6 1 1 1 
2 3 6 2 7 14 2 2 4 
3 5 15 3 9 21 3 4 12 
4 7 28 4 8 32 4 5 20 
5 8 40 5 1 35 5 7 35 
6 7 42 6 6 36 6 9 54 
1 5 35 7 4 28 7 10 70 
8 3 24 8 2 16 8 vi 56 
9 2 18 9 1 9 9 5 45 


Ne42 IfX-210 №50 ¥fX=203 Ne50 ZfX=297 


P6 БИ ZH ah ae AE 
Distribution A: А 42 5:00 
Mode=5, Median=5 
ERU CV s. SfX— 203 
Distribution B: X= gi EE 406 
Mode —3, Median=4 
AS. 2) ух Dias 
Distribution C : dio ——50 £594 
Mode=7, Median=6. 

It is clear from the above illustration that in distribution A the values 
of mean, median and mode are identical. Hence it is known as symmetri- 
cal distribution. In distribution B the value of mean is the maximum 
and mode the least. The excess tail is on the right-hand side. Hence it 
is known as positively skewed distribution. In distribution О, the value 
of mean is the least and the mode the maximum. The excess tail is on 
the left-hand side, Hence it is known as negatively skewed distribution, 

If we draw a histogram of each of these distributions, the diagrams, 


would be as follows : 


SYMMETRICAL DISTRIBUTION 


g 
$ 
& 
3 
© 
Ds 
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10) POSITIVELY SKEWED DISTRIBUTION 


Lj 


FREQUENCIES 
a 


zr 


OSE) ур тт 3778 
----м0---4] 

E -MED - --- 0} 

-= = = MEAN - - = -el 


M0=3-00, МЕО 2400, X= 4-06 


NEGATIVELY SKEWED D/STRIBUTION 


FREQUENCIES 


| M027, MED.=6, X= 5-94 


It is clear from the above example that : 


(1) In a symmetrical distribution, the values of mean, median and 
mode are alike, 


(2) In a positively skewed distribution, the value of mean is greater 
than the mode. The value of median would be less than the mean but 
greater than the mode. 


(3) In a negatively skewed distribution mode is maximum and the 
alue of arithmetic mean is the least. The value of median is less than 
mode but greater than the mean. 
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It should be noted that in moderately symmetrical distributions the 
interval between the mean and the median is approximately one-third of 
the interval between the mean and the mode. lt is this relationship which 


provides a means of measuring the degree of skewness. 
Difference between Dispersion and Skewness 
Dispersion is concerned with the amount of variation rather than 


with its direction. Skewness tells us about the direction of the variation 
In fact measures of skewness are 


'or the departure from symmetry. 

dependent upon the amount of dispersion. 
_ ,it_may be noted that although skewness is an important charac- 

teristic for defining the precise pattern of a distribution, it is rarely 

calculated in business and economic series. Variation is by far the most 

important characteristic of a distribution. 

Tests of Skewness 

In order to ascertain whether a distribution is skewed or not, the 
following tests may be applied. Skewness is present if— 

(i) The values of mean, median and mode do not coincide. 

(ii) When the data are plotted on a graph they do not give the 
normal bell-shaped form, i.e., when cut along a vertical line through the 
‘centre the two halves are not equal. А 

(iii) The sum of the positive deviations from the median is not 
‘equal to the sum of the negative deviations. 

(iv) Quartiles are not equidistant from the median. 

; (r) Frequencies are not equally distributed at points of equal devia- 
Xions from the mode. 
n Conversely stated when skewness is absent, t.e., in case of a sym- 
metrical distribution, the following conditions are satisfied : 

(i) The values ‘of mean, median and mode coincide. 

(ii) Data when plotted on a graph give the normal bell-shaped form, 

(iii) Sum of the positive deviations from the median is equal to the 
sum of the negative deviations. 

(iv) Quartiles are equidistant from the median. 

(v) Frequencies are equally distributed at points of equal deviations 
from the mode. 


Measures of Skewness 
Ё Measures of skewness tell us the direction and extent о 
in a series, and permit us to compare two or more series with r 
these. They may be either absolute or relative. 
Absolute Measures of Skewness 

Skewness can be measured in absolute terms by taking the difference 
between mean and mode. Symbolically, 

Absolute SK*=X —Mode. 


of asymmetry 
egard to 


* When skewness is based on quartiles, absolute skewness is given by. t he 
formula. Absolute Sk=Q3+Q,—2 Median 
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Tf the value of mean is greater than mode, skewness will be positive, 
£e. we shall get a plus sign in the above formula. Conversely, if the. 
value of mode is greater than mean, we shall get a minus sign meaning 
thereby that the distribution is negatively skewed. 


The reason why the difference between mean and mode can be used to 
measure skewness is that in a symmetrical distribution the values of mean, 
median and mode are alike, but the mean moves away from the mode 
when the observations are asymmetrical. Consequently the distance 
between the mean and the mode could be used to measure skewness—the 
greater this distance whether positive or negative, the more asymmetrical 
the distribution. However, such a measure is unsatisfactory on two counts : 

(i) It would be expressed in the unit of value of the distribution 
and could, therefore, not be compared with another comparable series 
expressed in different units. 

(ii) Distributions vary greatly and the difference between, say, the 
Mean and Mode in absolute terms might be considerable in one series 
and small in another, although the frequency curves of two distributions 
were similarly skewed. 

If the absolute differences were expressed in relation to some measure 
of the spread of values in their respective distributions, the measures wc uld 
then be relative and can be used directly for comparison, This leads us 
to the discussion of the relative measures of skewness. 


Relative Measures of Skewness 
There are four important measures of relative skewness, namely, 
1. The Karl Pearson’s coefficient of skewness, 
2. The Bowley's coefficient of skewness, 
3. The Kelly's coefficient of skewness, and 
4. Measure of skewness based on moments.* 


"These measures of skewness should mainly be used for making com. 
parison between two or more distributions. As a descri) tion of one 


. К . 7. > \ . 
distribution alone, the interpretation of a measure of skewness is necessaril 
»» ^e 


vague as "slight skewness”, “marked skewness,” or “moderate skewness’’, 
Requisites of a good Measure of Skewness 
А роса measure of skewness should have three properties. It 


should : 


1. be a pure number in the sense that its value should be independent 
of the units of the series and also of the degree of variation in the series, 


2. have a zero value, when the distribution is symmetrical, and 


y 3. have some meaningful scale of measure so that we could easily 
interpret the measured value, 


1. Karl Pearson’s Coefficient of Skewness 


Karl Pearson's coefficient, also popularly known as Pearsonian 
coefficient of skewness, is based upon the difference between mean and 


* For details please refer to the section ‘Moments’ in this chapter, 
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mode. This difference is divided by standard deviation to give а relative 
measure, "The formula thus becomes : 
Mean— Mode 
= Standard Deviation 0 

"There is no limit to this measure in theory and this is а slight draw- 
back. But in practice the value given by this formula is rarely very high 
and usually lies between +1. g 

When a distribution is symmetrical the values of mean, median and] 
mode coincide and, therefore, the coefficient of skewness will be zero. When 
a distribution is positively skewed, the coefficient of skewness shall have 
plus sign and when it is negatively skewed, the coefficient of skewness 
shall have minus sign. The degree of skewness shall be obtained by the 
numerical value, say, 0'8 or 0:2, etc. Thus this formula gives both the 
direction as well as the extent of skewness. 

The above method of measuring skewness cannot be used where 
mode is ill-defined. However, in moderately skewed distributions the 
averages have the following relationship : x 

Mode=3 Median—2 Mean 
and, therefore, if this value of mode is substituted in the above formula 
we arrive at another formula for finding out skewness. 

SK— [X—3 —— T X-3 Pa & ADS) a 

Theoretically this measure varies between +3; however, in 
""actice it is rare that the coefficient of skewness obtained by the above 

aethod exceeds 4-1. 
2. Bowley's Coefficient of Skewness 

An alternative measure of skewness has been proposed by late Pro- 
fessor Bowley. Bowley's measure is based on quartiles. In a symmetrical 
distribution first and third quartiles are equidistant from the median as 
can be seen from the following diagram : 


Q, MEDIAN 9; 


———————————————— 
pits oa A 


In a symmetrical distribution the third quartile is the same distance 
‘ove the median as the first quartile is below it, t.e., zs 
Q,—Med.—Med.—Q, or hts oD em 
Ifthe distribution is positively skewed, the top 25 per cent of th: 
values will tend to be further from median than. the bottom 25 per cen 
i.e., Q, will be further from median than Qj is from median and th 
Hence a possible measure is 


ге ativi К < 
verse for negative e ei) (Мей. 20) h, бк? Med. 
pr Mei) Ме =н о кчы 
Q,—9: 2 A 


SME—10°77-18 
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The denominator is in fact twice une quartile deviauon, so that the 
degree of skewness is again measured relative to. the dispersion of the 
distribution. This measure is called the quartile measure of skewness, 
and the value of the cofficient obtained varies between +1. 

Wherever positional measures are called for, skewness should be 
measured by the Bowley's method. ‘hus, this method is useful in open- 
end distributions and where extreme values are present 


Comparison of Pearsonian and Bowley's Measures of Skewness 


Complete absence of skewness, ie., symmetry, is indicated by zero 
in both Karl Pearson's and Bowley's methods They have in common 
only the general form that they are both derived from the difference 
between two measures of central tendency expressed as a ratio to measure 
of variation. 

It must be remembered that the results obtained by these two 
measures are not to be compared with one another. Especially the nu- 
merical values are not related to one another since the Bowley's measure, 
because of its computational basis, is limited to values between.—1 and 
+1, while Pearson's measure has no such limitation. 

Not only do the numerical values obtained from these two formulae 

ar no necessary relationship to one another but, оп rare occasions, with 
unusually shaped distributions, it is possible for them to emerge with 
opposite signs. 
3. Kelly's Coefficient of Skewness 

Bowley's measure discussed above neglects the two extreme quarter 
of the data. It would be better for a measure to cover the entire 
data especially because in measuring skewness, we are of ten interested in 
the more extreme items. Bowley’s measure can be extended by taking 
any two deciles equidistant from the median or any two percentiles equi- 
distant from the median. Kelly has suggested the following formula for 
measuring skewness based upon. the 10th and the 90th nercentiles (or the 
first and ninth deciles) : 

= PiotP—2 Med. Ah SK D, D,—2 Med, 
SK Pa- Pi so DeD 

This measure of skewness has some theoretical attraction if skewness 
isto be based on percentiles. However, this method is not popular in 
practice and generally Karl Pearson's method is used. 


4. Measure of Skewness based on the third moment 


A measure of skewness may be obtained by making use of the third 
moment about the mean. This would be discussed in the same chapter 
under the head moments. 

The following illustrations shall explain the application of the above 
methods : 

Dlwstration 1. Find the coefficient of skewness from the following data : 
Value 6 125 797 24 730: * ^36 "1143 
Frequency 4 7 9 18 15 10 5 
(B. Com. Nagpur, 1973) 


SKEWNESS, MOMENTS AND KURTOSIS E-9:8 


Solution. CALCULATION OF COEFFICIENT OF SKEWNESS 


Value Qx-24)6 
x J d' fd’ fa’ 
$ 4 -3 -n Г] gy 
12 7 —2 —14 "n 28 
18 9 -1 - 9 
24 18 t 0 fou 
30 15 1 15 15 
36 10 2 20 40 
42 5 3 15 45 
N68 EZfd'—15 Zfd-173 
M2an—Mode 


Applying Karl Pearson's Mzthod : Sx= 


Calculation of Mzan: Xx. =4+ P c 
A24, 3 fd'=15, N=68, C6 
X222 15 6224413222532. 
68 


Calculation of Mode : By insp2ction modal value is 24. 
LIS 
Calculation of Standard Deviation : e "f D 2 -(44 Li ) xe 


Xfd^ 173, Zfd'=15, N=68, C=6 
73 7 is\t 
gu) х6 
4/2544 —U 048 x 6 

= 1°58 x 60:48 
X —25:32, Mo=24, cm9°48 
2532-24 1°39 


SKe— Sg yag =" +0139. 
Illustration 2. Compute Karl Pearso3's coffizient of skswaess from the follow- 
ing data : 
м Variable Frequency Variable Frequency 
'5—23°5 17 29 5—325 194 
RET 193 325—355 ?7 
265—295 399 355—385 10 


(B. Com., Delhi, 1969 ; B, Com., Gorakhpur, 1$74) 
Solution. CALCULATION OF COEFFICIENT OF SKEWNESS 


Variable | Mid-points Frequency (m—28) ("> 

m f 4 $^ m fa’ 

me стт a M e MIO MP qe Fer un 
205—235 22 17 —6 =2 = 34 58 
23:5—26:5 25 193 -3 -1 —193 193 
265—295 28 399 0 0 0 0 
295—325 31 194 3 1 194 194 
32:5—35-5 34 27 6 2 54 108 
355—385 37 10 9 3 30 90 
N=840 Xfd'2s|  Xfd"*—&53 
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Calculation of Мет: X44 кс 


4—28, fd' ^51, N=840, C=3 
X-28 4 E 3-28-18. 


Calculation of Mode : Mode=L+ хі 


Mode lies in the class 26:5—29:5 
Le26'5, 51—(399—193)—206, ^5--(399—194) —205, 13 


54.206 ы 
Mo=26 5+ 3064 205 X3e26541:5—28. 


Ha tidy 
Calculation of Standard Deviation : on т a =( M )x 


Zf4'1—653, Zfd'—51, Nan, с=з. 
& "T 63 зү Ee 
-—V0777—0037  —0:88x3e2:64, 
Mean 28:18, Mode=28, о=2'61 
Substituting the values in the formula, SK= 28: жт —-+0`058 


Mlustration 3. Calculate Karl Pearson's coefficient of skewness from the 
following data : 


arm 


Marks No. of students Marks No. of students 
Above 0 150 Above 50 70 

w 10: 140 » 60 30 

» 20 100 КД.) 14 

Kane 80 » 80 0 


(B. Com., Delhi, 1972) 
Solution. CALCULATION OF COEFFICIENT OF SKEWNESS 


Marks | | (m—35)/10 
й d' Sa’ Ја" c.f. 
ЕРЕ o. cxi eov кызыш. 
0—10 10 5 -3. —30 90 10 
10—20 40 15 —2 —80 160 50 
20—30 20 fox -1 = 20 70 
30-40 0 35 0 0 0 70 
40—50 10 45 1 10 10 80 
50—60 40 55 2 80 160 120 
60—70 16 65 3 48 144 136 
70—30 14 75 4 56 224 150 
Ты И cf cr EK cM oid M M Эн 
N=150 Efd'=64 YXfd^?-—808 
= ———————— 
5 it isa bi.modal series the coefficient of skewness will be calculated by 
Sx..3 (C- Med) 


B. Сот., Madras, 1977 
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Calculating Mean, Median and Standard Deviation 
Calculation of Mean: X 44-3 xc 
A35, Md'64, N «150, C=10 
X-35 + m X102354-4:27439:27 
Calculation of Median : Med. Size of —Y +h item-a +3 =75th item 
Median lies in the class 40—50 


ND —e.f. 
Med. L + TEXTES 


Lc30, N[2 75, ¢.f.=70, f=10, i10 


меай.=40 + C579 19-45 
Calculation of Standard. Deviation : f M e( M хс 


Zfd'* 808, 3/4" 64, №150. C=10 


808 764 ү' LE TS 
E (5) x10 =V 53370-18? x 10 


=y 5 205x10—2:28x10—22:8 
39:27, Med.=45, a=22'8 
3(39:27—45) _ .—573) | 1719 o 
CORN DEAS DE 278 278 OTS 


lliustration 4. You are given the position in a factory before and after the 
settlement of an industrial dispute. Comment on the gains or losses from the point of» 
view of workers and that of management. 


Before After 
No. of workers 2,400 2,350 
Mean wages (Rs ) 45:5 47:5 
Median wages (Rs.) 480 45'0 
Standard Deviation (Rs.) 120 10:0 


( B. Com. Mysore, 1974) 


: Solution. The following comments can be made on the basis of the information 
given : 


4 (i) By comparing the total wage bill we can comment on the increase or decrease 
in the level of wages. 


Total wage bill before the settlement of dispute 2400 x 4$: 5— Rs. 1,09,200 

Total wage bill after the settlement of dispute-2350x47'5-—Rs. 1,11,625 

Hence the total wage bill has gone up after the settlement of dispute even 

though the number of workers has decreased from 2,400 to 2,350. This means that the 

average wage is now higher. This is definitely a gain to the workers. 

Conversely, we cannot say that increased wage bill is necessarily a loss to 
management because if it results in greater efficiency of workers and, therefore, higher 
oroductivity it у ould be a positive gain to management also. 

(ii) Median before settlement of the dispute was 48 and after settlement it is 45. 


This means that formerly 50% of workers used to get wages above Rs. 48 and now they 
get only above Rs. 45. Т 
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(iii) By comparing the coefficient of variation before and after the settlement of 
dispute we can comment on the distribution of wages. 
Coefficient of variation before the settlement of dispute 


CX. X100, where «12, Y —45:5 


12 Tm 
uc C.V.— рр X 100—26:37 


Coefficient of variation after the settlement of dispute o=10, X—47:5 
10 
€.V.— 17.5 X100—21:05, 

Since the value of the coefficient of variation has decreased from 26°4 to 21:05 
there is sufficient eviderce 10 conclude tha iWages ale more uniformly distributed after 
the settlement of dispute or, in other words, there is lesser inequality in the distribu- 
tion of wages after the dispute issettled, 

(i) By comparing skewness we can ccmment vpcn tke nature cf the 

distributions. 

Coefficient of skewness before the settlement of disputé 

a Y— S i -—7* 
orn X: -Med),. е 48) _ n EN 
Coefficient of skewness after the settlement of dispute 
SK- AES LS TS 24075 

Thus the distributicn js Tesitively skewed afier the settlement of dispute 
whereas it was negatively skewed before the settlement of dispute. This Suggests that 
the number of workers getting low wages bas increased considerably and that of 

, Workers getting high wages fallen, though the actual wage of workers has “increased. 

Illustration S. Calculate measure of skewness based on quartiles and median 

from the following data : 


- Variable Frequency Variable Frequency 
10—20 358 50—60 62 
20—30 2,417 60—70 18 
30—40 976 70—80 10 
40—50 129 


У ; Solution, Measure of skewress tased cn quartiles and medien is given by the 
formula : 


084+ 01—2 Med. 
BE 0з—О\ 


CALCULATION OF ©з, Qs AND MEDIAN 


DERQAC. Оттеп. 
ў 10—20 358 358 


20—30 2417 2,775 

30—40 976 3,751 

40— 50 129 3,880 

50—60 62 3,942 

60—70 18 3,960 

70—80 10 3,970 
ИЛЕС аа LU а с. 


Calculation of Q; : Q1-Size of Jus пете 570. =:992:51Ь item 
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Hence Q; lies in the class 20—30 


L=20,N/4 =992'5, c.f.=358, f—2,417, i=10 


‘ 992:5—358 ЖЕЕ ТЫ 
z^ Q1=20+ g X10-204-2:625—22:625 


Calculation of Qs, 


Q3=Size of ЗА item 33.970 


=2977'5th item 


Qs Lies in the class 30—40 


3NI4 —c.f. 
Qs=L + f xi 


L-—30 3N/4=2,977'5, c. f:=2,775, f= 976, i=10 


2977:5—2775 2,025 à i 
оз=зю 4.275217 x10-304 ^77. =30+2'07=32-07 


Calculation of Median: Med=Size of Za item 3970. 985th item 


Median lies in the class 20—30 


М2 —c.f. 
Med=L+ F xi 


L=20, N/2'=1,985, c.f. —385, f—2417, i10 


Meden204.1985 335 y 10=20+6:73=26:73 


01=22:625, Q4 32:07 and Med 2673 
32:07+22:625—2(26"73) 1235. |. 
Coeff of Sk= 0727625 7-944579 131. 
Illustration 6. In a certain distribution the followixg results [were obtained : 
X —45:00, Median-48:00 
Coefficient of skewnesse» —0°4. 


The person who gave you the data failed to ive the value of standard deviation 


and you are required to estimate it with the help of the available information. 
€ B. Com., Bombay, 1974) 


Solution. Coeff. of sg Mean Mediam). 


We are given X-45,  Median=48, 


and coefficient of skewness=—0°4. 
Substituting these value in the above formula · 


= 0740545 = 48) or —04e-—9 


9 
LE =22'5. 
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MOMENTS 


In Physics, ‘moment’ is a measure ofa force with respect to its 
tendency to provide rotation. The strength. of the tendency depends on 
the amount of force and the distance from the origin of the point at 
which the force is exerted. If a number of forces, F,, F,...F,, at distances, 
X; X» X. аге applied the moment of the first force about the origin is 
F,X,, the moment of the second force is FX, etc. These moments are 
additive so that EPX is the total moment about the Origin. If the total 
moment is divided by the total force, the quotient is termed “а moment 


coefficient”. The formula is es where N—Z P is the total force. 


Tt can be seen that the formula for a moment coefficient is identical 
with that for an arithmetic mean. This identity has led statisticians to 
speak of the arithmetic mean as the “first moment about the origin", 
Technically the mean is а moment coefficient and not a total moment, 
but in the case of frequency curves, with which mathematical statistics is 
primarily Concerned, the tota] frequency X is generally taken as unity, so 
that the total moment and the moment coefficient are identical. In any 


case it has become customary in statistics to speak of the mean X= Im 


as the first moment about the origin, and the distinction between the total 
moment and moment coefficient is ignored. The concept of moments is also 


extended to higher powers. The statistical definition of the term ‘moment’ 
can be given as follows : 


first power of the deviations, we get the first moment about the mean ; the 
mean of the squares of the deviation gives us the second moment about 
the mean ; the mean of the cubes of the deviations gives the third moment 
about the mean ; and so on, The moments about mean are denoted b 
Greek letter ш (read as mu) ; thus Pı stands for first moment about mean, 
I^ stands for second moment about mean, etc. Symbolically, 


EX-X) X. . 
h= "ge ог + [since sum of deviation of items 


from arithmetic mean is always 
zero, A^, would always be zero.] 


dm yor Y It; —6! or o= У) 
Za- cm 

пту v 

ne ESI а, 
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For a frequency distribution z ‘ 
2f(X-X) _т =X)? fet 
„= a Bess EIE ZR 


EX) Ur. Xy 
paan Ie ga ил Rye 2. 


Moments сап be extended то higher. powers in a Similar way, but 
generally in practice the first four moments suffice. Furthermore, as pointed 
out by Yule and Kendall, "moments of higher order, though important 
in theory, are so extremely sensitive to sampling fluctuations that values 


Two important constants of a distribution are calculated from i, Fy 
апа ш. They are; 


2 
(i) В, (read as beta one) ^s. 


2 
(ii) B, (read as beta two) 2а. 
2 


В, measures skewness and Bo kurtosis. 

In a symmetrical distribution all odd moments, {.е., Hy, ду, etc. would 
always be zero, The reason is that if the curve is symmetrical there will 
be a deviation below the mean which exactly equals each deviation 
above the mean and, therefore, positive deviations and negative deviations 
wil] éxactly balance out and when added will cancel out, i.e, Z(X—X) 
ахы be zero. Of course, if the deviations are raised to even 
powers their sign will always be positive and they will no longer cancel 
out. But the sums of the odd powers will all be equal to zero on 
account of the cancellations. Thus, odd momenis are always zero 
in 'aymmetrical distribution. However, this rule does not hold true in 
asymmetrical distributions. 

Where the actual mean is in fractions itis difficult to calculate 
moments by applying the above formulae. In sucha case we can first 
compute moments about an arbitrary origin ‘A’ and then convert these 
moments into moments about the actual mean, Moments about arbitrary 
origin are denoted by the symbol н’ to distinguish them from the moments 
about mean which are denoted by #. Thus ру’ would stand for the first 
moment about an arbitrary origin ‘4’, и, for the second moment about an 
arbitrary origin А, and ѕо оп. The calculations shall be done as follows ; 


Spy odd IX AS 

am eoa), pre TOC AY 
; ser 0х4) 
ФЦЕ „ Hal 


For a frequency distribution. ^ 
tim oe ee or 28 or ZIE xo 


[ where 20-4] 


p 
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Pu ls 
а= P к ог “са ог Au xc. 


Inthesame way the formulae for 3rd and 4th moments can be 
written. 


Conversion of Moments About an arbitrary Origin into Moments 
about Mean or Central Moments 


For the sake of simplicity in calculations moments are first calculated 
about an arbitrary origin. If we want to obtain moments about mean we 
can do so with the help of the following relationships : 


Py А 0 ; Bam ia! — 9i Ba! 2 (43) 
Ba Ha — (H4 5 amba! — A) pa! +6(и,/')(и,')#—3(и,')* 
Moments about Zero 


The moments about zero are often denoted by v,, ур, эу, etc. and are 
obtained as follows : 


Уух хух 
р жы 
ег end BS: 

Be EN с N 


Also : 

The first moment about zero or v, =A +u,’ or the mean* 
The second moment about zero or-v, = pe + (4)? 

The third moment about zero or уз =з --3уу —2v,* 

The fourth moment about zero or »,— 444-47,» —6v,*yg - 3v,* 


Purpose of Moments. The concept of moment is of great significance 
in statistical work. With the help of moments we can measure the central 
tendency of a set of observations, their variability, their asymmetry and 
the height of the peak their curve would make. Because ofthe great 
convenience in obtaining measures of the various characteristics of-a fre- 
quency distribution, the calculation of the first four moments about the 
mean may well be made the first step in the analysis of a frequency 
distribution. 

Sheppard's Correction for Grouping Errors 


While calculating moments it is assumed that all the values of a 
variable in a class interval are concentrated at the centre of that interval 
(йе. mid-point). However, in practice it is not so- the assumption is an 
approximation to facilitate calculations and ‘it introduces some error which 

‘isknown as grouping error. But for distributions of symmetrical or 
moderately skew type and ‘class intervals not greater than about one- 
twentieth of the range, the approximation may bea very close one. In 
other cases we should apply Sheppard’s corrections to eliminate grouping 
error, 

The moments that we have computed; which have not been corrected 
by Sheppard’s process, are called the crude moments, to distinguish them 


,,. "Thesigns are reverse of what we had while converting moments about ar 
arbilary origin to moments about mean. 
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from the adjusted momentis which we get by applying Sheppard’ - 
tions. These corrections are : CA ewm eM pare 


Hy (corrected) =, (uncorrected) — 42/12 
H, (corrected) =#, (uncorrected) —3i*&, (uncorrected)4- kis $* 
where i is the width of the class interval. 


The first and third moments need no correction. 


Conditions for applying Sheppard’s Corrections 

The following conditions should be satisfied for the application of 
Sheppard's corrections : 

(i) The correction should not be miade unless the frequency is at 
least 1,000 otherwise the moments will be more affected by sampling 
errors than by grouping errors. 

(ii) The correction is not applicable to J- or U-shaped distributions 
or even to the skew form. 

(iii) The observations should relate to a continuous variable. 

(iv) The frequencies should taper off to zero in both directions, i.e., 
the curve should approach the base line gradually and slowly at each end 
of the distribution, 3 

However, as pointed out by A.E. Waugh, “‘the corrections are small 
and the statistician is foolish to bother with them if the original figures 
are rough approximations. But where we have continuous data with the 
characteristics described above and where the original measurements are 
reasonably precise, we may well apply Sheppard’s corrections to eliminate 
the grouping error”. 

Illustration 7. Analyse the frequency distribution by the method of moments : 

Х: 2 3 4 5 6 
Y; 1 EPI ARE os 1 
Solution. CALCULATION OF FIRST FOUR MOMENTS 


Х=4 £ fe ух? ГАЗА 
x f x 
2 1 —2 -2 4 -8 16 
3 3 =f -3 д —3 3 
4 7 0 0 0 0 0 
5 3 1 3 3 3 3 
6 1 2 2 4 8 16 
N=15 Ifx20 — Xfé-14  Zfe-0 Efx1—38 
ДХ) b 0 LE 
m= dti › or i =j; =0; p= A =0 
зрази Sais Beith) o. 
Rim ais: =0°933 HA N "IS =7533 


e= y/variance or мв =y 0:933—0:966 


a a pa S 
hi es o 7? 
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Ina symmetrical distribution 6; is zero. Hence this distribution is sym- 
metrical. 


Qu 2533. 2533. 
hou 79397037 7?! 
Since the value of 8 is less than three, the curve is playkurtic. 
Illustration 8. From the following data of the wages of 50 workers of a factory 
compate the first four moments about mean and also the value of fs : 


Monthly wages No. of workers Monthly wages No. of Workers 
(Rs.) 


(Rs.) 

100—120 1 180—200 12 
120—140 3 200—220 4 
140—160 7 220—240 3 
160—180 20 


Solution. CALCULATION OF FIRST FOUR MOMENTS 


Monthly Mid-points No.of | (m—170) 
m 


wages in workers ^'^ 20 . 4 
Rs. f d' fd fa" fa’ fas 
een Celio сл ВИВА E Se ee 
109—120 110 1 -3 -3 9 -27 81 
120—140 130 3 -2 -6 12 —24 48 
140—160 150 J. -1 -7 we -7 7 
160—180 170 ? 0 0 0 0 0 
180—200 190 12 1 12 12 12 12 
200—220 210 4 2 8 16 32 64 
220—240 230 3 3 9 27 81 243 
N=50 Zfd'=13 хуа" уаз Xf 
=83 =67 = 455 
Д 
pre SE хС= 5 х20=5°2; 
2/4* 83 Mec Т. Hal 67 
t= ÁN X C= -gg X400 604 ; fae x Che Fx 8,000—10,720 
"4 
м TT cie SS 51,60,000—14,56,000 
meat 


a= (3 —(91)! = 664— (5:2) =004—2/ 04—636:96 
Bg 837 — 3k Pa +212 = 10,720 —(3 x 5:2x 664) -2(5:2)* 
710,720 —10,358:4--281:216—642:816 
Вара рува 68,792 — 3p, 
=14,56,000—(4 x 5:2x 10,720) + 6(5:2)! x 664—3(5:2)* 
714,56,000—2,22,9764-1,07,7274—2193:5—13,38,557:$ 
L2 13,38,557°9 . 
at = зб D 
Since the value of Bs is greater than 3, the curve is leptokurtic. 
Illustration 9. From the following data, calculate moments about : 
(0 assumed mean 25, 
(ii) actual mean, and 
(ШЇ) moments about zero. 


Variable: 0—10 10—20 20—30 30—40 
Frequency : 1 3 4 2 
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Solution, CALCULATION OF MOMENT ABOUT ARBITRAKY ORIGIN 


Variable y m.p. (m—25)/10 


m а fa’ fa? fa’ fas 
0—10 1 5 =2 =2 4 -8 16 
10—20 3 15 m} „3 3 =3 3 
20—30 4 25 0 0 0 0 0 
30—40 2 35 1 2 2 2 2 
М=10 х Ја rf 5/43 уда 
= =9 e ST 
ny S REC 
m=- хС= jo X10 3 
‚ыла... ® 9 1 
MEN. хС?= jp Х010*= Jp *100=90 
"3 - 
s (C= ч х1000=—900 
‚_ Xa" "TI ES 
Шр рй e x(Cy = Jo *10,000=21 000 


Moments about Mean &,—0 
кзера'—(#1)#=90—(—3)°==81 
Ba 63—387 4-221/2= —900—3((—3)90)]4-2(—3)* 
== —900--810—545 — 144 
Pa— 844—491 93 x би Hg! — 38^ 
—21,000—4((—3)(—900)--6(—3): x 90—3(—3) 
=21,000— 10,800 4-4,860—243= 14,817. 
loments about Zero — v  A--&,' or the mean =25—3=22 
Yo Ho - (vj)! 81 - (22)? — 565 
Yg— vg -39y —2у13 == —144 +3(22)(565) – 2(22)3 
=—144+37290—21296=15850 
vgs ty +4173 — бур? 3-39, 
—2148174-4(22)(15850) —6(22)*(565) 4-3(22) 
= 14817—1394800 + 1640760 4-702768 471625. 
Illustration 10. The first four moments of a distribution about х=2 are 
1, 25, 55 and 16. Calculate the four moments about X and about Zero. 
(M. Com., Delhi, 1972) 
Solution. We are given #y’=1, i25, рз'=5°5 and &,/e-16. From thes 


moments about arbitrary origin we can find out moments about mean from the follow. 
ing relationships : 


Bora p)? 
Рз=Рз'—3ру'р;/-Е2(р1')® 
Bac ha! —Ap4 ta 683 (837? — (84€ 
Substituting the values 
®з=2°5—(1)%#—=1-5 
Bac 5:5— 3(1)(2"5)+2(1)3=5:5—7:5+2=0 
4*7 16— 4(1),5:5)-6(25) (17 —3(1)416—22-4-15—3—6 
Thus moments about mean are &1—0, P2=1°5, p3—0. 11-6, 
Moments about zero : 
Let moments about zero be denoted by у, Уз, Уз, etc. 
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The first moment about zero, ie., vj-4--&;^ or mean 


The second” A Si » Le, vg nt 
The third ,, » » Les, уз==Рз+-3ууу@—2>у13 
The fourth ,, c » Le, vna 4утуз — бу2у24- 31 
In this question 
уү=2+1=3; ¥g=0 +(3 x 3x 1075) —2(3)35—94:5 — 54—40*5 


уз=15—(3)%=10°5; v,—6--4(3)(40:5)— 6(3)*(1075)4-3()* 
76--486— 567 243—168. 
Measure of Skewness based on Moments 
A measure of skewness is obtained by making use of the third 


moment about the mean. When the method of moment is applied, 8, is 
used as a relative measure of skewness. 8, is defined as 


Ba? = [ H. Ы 
Bim ig or 48, = Jag - This is also symbolised as ү}. 
The value of f, shall be zero for a perfectly symmet.ical distribution. The 
greater the value of 8,, the more skewed is the distribution. 
Also Fy ee) a 
FK X58,—68, —9) 
There is no limit in theory to the above measure and this isa drawback. 


This method gives good results only in case of. distributions having mode- 


rate skewness. If the shape of a curve is much different from a bell-shaped 
curve the above method may give absurd results. 


Mlustration 11. Measure skewness based on moments from the following 


data : 
Marks Frequency Marks Frequency 
0—10 5 30—40 45 
10—20 20 40—50 10 
20—30 15 50—60 5 
Solution. COMPUTATION OF SKEWNESS BASED ON MOMENTS 
Marks m.p. (m—35)/10 
m d! fa' fa fas 
OSE Weir e IS он UBRO DU НИН НАРНИИ 
,0—10 5 5 -3 —15 4: —135 
10—20 20 15 —2 —40 80 mr 
20— 30 15 25 -1 —15 15 — 15 
30— 40 45 35 0 rmx 0 0 
40—50 10 45 . 1 10 10 10 
50—60 5 55 2 10 20 40 
Sey n M с тысе ae амаа MER T TM ^ 
N=100 Zfd'=—50  Xf4'?—170 уудз——260 
1, 2fd' —50 
кеу ХС= лоо X10— -5 
,. if т 
ват у ХС M о х 100—170 
(Lfd —2 


—260 
Pa AW x= 100 X1,000 — —2,600 
*1—91—(9,)1—170— (51.145 


ў 
, 
; 
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з= Pa bi ia 2, 72—1,6007-3:5)170) 4-2 5) 


7:—2,6004-2550—250— —390 M 
— Bs —300 ^ —300 + 
v COR (458 1746; 7-9 17 
KURTOSIS* 


Besides averages, variation and skewness, a fourth characteristic used 
for description and comparison of frequency distributions is the peaked- 
ness of the distribution. Measures of peakednessare known as measures 
of kurtosis. © 


Kurtosis in Greek means ""bulginess". “In statistics kurtosis refers to 
the degree of flatness or peakedness in the region about the mode of a 
Tequency curve. The degree of kurtosis of a distribution is measured 
relative to the peakedness of normal curve. In other words, measures of 
kurtosis tell us the extent to which a distribution is more peaked or flat- 
topped than the normal curve.* If a curve is more peaked than the 
normal curve, it is called ‘leptokurtic’. In such a case the items are more 
closely bunched around the mode. On the other hand, if a curve is more 
flat-topped than the normal curve, it is called *platykurtic'. Тһе normal 
itself is known as ‘mesokurtic’. The condition of peakedness or flat-topped- 
hess itself is known as kurtosis or exces.f The ccncept of kurtosis is 
rarely used in elementary statistical analysis. 

The following diagram illustrates the shapes of three different curves 
mentioned above : 


А - MESOKURTIC 
B_LEPTOKURTIC 
C_PLATYKURTIC 


жс iei i: 1 i i 
‘Kurt ee of peakedness of a distribution, usually taken relative 
to a normal distribution." ba : Theory and Prablems of Statistics p. 93. 


t^g Waugh: Elements of Statistical Methods. 
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The above diagram clearly shows that these curves differ widely with 
regard to convexity, an attribute which Karl Pearson referred to as 
‘kurtosis’. Curve А is a normal one and is called ‘mesokurtic’ Curve В is 
more peaked than А and їз called ‘leptokurtic’: А leptokurtic curve has a 
narrower central portion and higher tails than does the normal curve. 
curve C is less peaked (or more flat-topped) than curve A and is called 
‘platykurtic’. Аз may be seen from the diagram, such a curve has a 
broader central portion and lower tails. 


A famous British statistician William 5. Gosset ("Student") has very 
humorously pointed out the nature of these curves in the sentence, 
*Platykurtic curves, like the platypus, are squat with short tails ; 
leptokurtic curves are high with long tails like the kangaroos noted for 
lapping.” Gosset's little sketch is reproduced below : 


Measures of Kurtosis 


The most important measure of kurtosis is the value of the coefficient 
Ba. It is defined as : 


A= zc where #,=4th moment and #,=2nd moment. 
2 


"The greater the value of Ba the more peaked the distribution. 

For a normal curve the value of Pa=3. When the value of 2, is 
greater than 3 the curve is more peaked than the normal curve, i.e, 
leptokurtic. When the value of В, is less than 3 the curve is less peaked 


than the normal curve, i.e., platykurtic. The normal curve and other 
curves with B,—3 are called mesokurtic. 


Sometimes y, the derivative of B, is used as а measure of kurtosis. 
ї is defined as 


їз=Вһ—3. - 


For a-normal - distribution Үг=0. If Y, is positive, the curve! is 
leptokurtic and if y, is negative, the curve is platykurtic, 


Illustration 12, Compute the coefficient of Skewness and kurtosis based on 
moments for the following distribution : 


ME AES вай др ФАСО nodis dd МЭНД CERE 745 845 04-5 
VIELES 5 TUM 9 4 3 1 ] 
^ S (М: Com, Delhi, 1959) 
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Solution + CALCULATION OF SKEWNESS AND KURTOSIS . 
SSS 


x f (X-A85)10 д уа? fa fan 
d' . 

45 1 -4 4 16 —64 256 
145 5 -3 —15 45 —135 405 
245 12 —2 —24 48 —96 192 
34:5 22 —1 —22 22 —22 22 
445 17 0 0 0 0 0 
545 9 1 9 9 9 9 
645 4 2 8 16 32 64 
745 3 3 9 27 81 243 
84:5 1 4 4 16 64 256 
945 1 5 ET] 25 125 625 

М=75 Efd'=—30 50224 Efd'3=—6 У/й'4=2,072 


(ырынды yide А ic 
mte E = од; ate PT 2-6 Legs 
Ln а = 24 29 5 at i 2m =27°63 
Ha a! — (91)! —2:99—(—0:4)? 2:99 — 0°16 2°83 
pa*=pg — 3m hg +2(1’)8—= —0:08—3(—0:4)2:99) -2(—0:4)* 
— —0:08.4-3:588—0:128 23:38 
Bg" pa! — 4e 85-6879, —3(1)* 
—27:63—4) —0:4)(—0:08) J-6(—0:4)*(2:99) —3(—0:4)* 
—27:63—0:128-4-2:87 —0:077-—30:295. 


вэ PS (3°38) „ДЕА. 6. 
Skewness : By* Be 83 67 70504 


For kurtosis we have to compute the value of Bg 
ILLE = a E =3' 
< [^ ma? (2:837 8 3:787 
Since the value of 8; is greater than 3, the curve is more peaked than the no1mal 
curve, i.e., leptokurtic, 


Illustration 13. The first four central moments of a distribution аге 0, 2'5, 


0:7 and 18°75. Test the skewness and kurtosis of the distribution. | 
38 М. Com. Delhi, 1972) 


Solution : 
Testing Skewness 
We are given р=0, pa=2'5, #s=0°7, and p4*18:75 
Skewness is measured by the coefficient 81. 
2 
Here B3=0°7, р2=2'5 


are 00:7) А 
Substituting the values B= G5) = 4-0*031 


Since B,— 4-07031, the distribution is slightly skewed, j.e., it is not perfectly 
symmetrical. 5 


*We have not multiplied в”, #2’, вз’, ete., by the common factor, i.e., 10, 100, 
1000 respectively for the reason that the values of 8; and Be will not be affected thereby 
and the calculations will be simplified. 


SME-—10'17-19 
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Testing Kurtosis : Р 

For testing kurtosis we compute the value of Bp. When a distribution is normal 
or symmetrical, 8 . When a distribution is more peaked than the normal, B, is 
more.than 3 and т it is less peaked than the normal, B; is less than 3. 


rs ; where #4=18°75, в=2°5 
1875 1875 
idm: TINI s "З 
Since 0; is exactly three, the distribution is mesokurtic. 
MISCELLANEOUS ILLUSTRATIONS 
Illustration 14. From the data given below, find which group is more skewed : 


GroupA 100 118 122 109 105 107 121 113 105 100 
GroupB 18 26 21 "CARS Зу ТО? 17 15 16 17 


(M.A. Econ. Jabalpur, 1975). 
Solution: COMPUTATION OF COEF FICIENT OF SKEWNESS 


Group A Items arrange Group В Items arrange 
X (X-F) /QCOTXX? imascending Y (Y-F) (ҮҮ? inascending 
order order 
100 10 100 100 18 2 4 15 
118 8 64 100 26 6 36 16 
122 12 144 105 21 1 1 17 
109 ЕЛ 1 105 25 5 25 17 
105 ES 25 107 23 2 9 18 
107 ERU 9 109 2 3 4 21 
121 11 121 113 17 m 9 22 
113 3 9 118 15 -5 25 23 
105 2.5 25 121 16 —4 16 25 
100 —10 100 122 17 -3 9 26 
a eS ho Sy 
EX=1100  XX—Yyj-598 EY=200 EY—Yy-138 
бе ^ 
. v. 1109 
Group A : fm =110 
Med.-Size of Nl th iem. 10H Ls sth item 107-109 | gg 


Lo [ xx-Xy 598 jae, 
Ў " \ ME 10 V598—7:73 


3X—Med) _ 010—108). 
Coeft. of 5с HE —Med) ттр 350 отв 
200 


Group B : Y= ^19 720 


EROS PNE са 
Med=size of N+! ih йет з items 28#+21_,., 


; 2 
ху? 138^ 
onal Yu. DU утуу 


j = 520-195) — 15 
Сой. of Sk. E ^W o 4044 


Since the coeffici f ; ^ 
group A is more skewed compared io B. 19 more in case of group A, we conclude that 
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Illustration 15. Calculate Karl Pearson's coefficient of skewness from the 
following data : ^ 


miri es No. of stenographers Monthly wid No. of stenographers 
3 (Rs. 


Below 80 12 Below — 120 157 
» 90 30 $5 130 202 
” 100 65 БУ 140 222 
» 110 107 x 150 330 A 
` (B. Com.,Karnataka,1973) 
Solution : 
CALCULATION OF KARL PEARSON'S COEFFICIENT OF SKFWNESS 
Monthly No. of ( m-—105 
salary stenographers 10 
(Rs.) f m d f'a /4% 
70—80 12 75 —3 —36 108 
80—90 18 85 —2 — 36 72 
90—100 35 95 —1 —35 35 
100 - 110 42 105 0 0 0 
110—120 50 115 1 50 50 
120—130 45 125 2 90 180 
130—140 20 135 3 60 180 
140—150 8 145 4 32 128 
N=230 Efi’ 125 5/4'%=75- 
Coeff. of x= Mean- Mode 
Calculation of Mean : Xa AVE xC 


A=105, Zfd'=125, N=230, C=10 
X-1054125 x 10=105 45:43 =110'43 


re 2 230 
Calculation of Mode : 
By inspection the mode lies in the class 110—120. 
А1 
Моа FA xi 


L=110, ^ =(50—42)=8, As (50— 45)= 5, i-10 


on Мо=110+ x10—1104-6:154—115:154 


з 


CI? 75 w 
Calculation of Standard Deviation : e (B) xc 
Efd’?=753, Yfd! —125, N=230, C=10 
; 24] 253 (125 Yyioz4/3274-0295 x 10 
E: fo SPA 30K 230 A) аут 
=1-726х10=17:26 
X=100 43, Mo—116154, 5—17:26 


110:43—116154 — 
e Coeff, of SK= иту IIIA DEN 0 332. 


Illustration 16. The weekly wages earned by one hundred workers of a factory 
are set out in the following table : 
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Weekly wages No. of workers arte yen No. of workers 


[5:2] 
12:5—17:5 12 37:5—42:5 10 ý 
17:5—22:5 16 425—475 6 
22°5—27°5 25 4T$5—52:5 3 
275-325 14 52°5—57°5 1 
32°5—37°5 13 


G) Calculate the three quartiles of the above distribution. 

I (ii) Find the absolute measures of dispersion and skewness based on these 
quartiles. 

(iii) Interpret all the five measure, that you have calculated in (a) and (b) above. 

(C.A., Nov., 1972) 

Solation : CALCULATION OF QUARTILES AND COEFFICIENT OF SKEWNESS 

2 SALA CRT БЫН, se 


Weekly wages rs c.f. Weekly wages £f ef. 
(Rs.) (Rs.) 
Wop а mmo C NR 
1255-175 12 12 37:5— 42:5 10 90 
175—22:5 16 28 425—475 6 96 
225—2T5 25 53 4T5—525 3 99 
275—325 14 67 525—575 1 100 
325—375 13 80 
cl ore aL Elie ud ee Sees imu у 
[0] Qi-Size of Xin item= m =25th item 
Qi lies in the class 17:5—22:5, 
Nia —с./. 
-L 
Qi-L + y хі 
L=17°5, №4 =25, c.f.—12, f=16, i-5 
c 01=17:5+ Sx X5-—1T:54-4:06—21:56 
Qro Siz of Mth item= 25100 L 50h items 
Оз lies in ‘the class 22°5—27°5, 
2N/4—c.f. 


Qs—-L4- 5 aE xi 
L=22'5, 2NIA—50, c.f. 28. f=25, i5 


E Qio225. 0528 X5-225--44—269 


Qs-Size of SN item= 22100 5th item 


Qs lies in the cless 32:5—37:5. 


= 3N/4—-c.f. 
Qy-L- —Á;,— xi 


Le«325, 3/4975, c.f. 67, f—13, ims 
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Quo 32:54 BOO -хз—з>5-+-308=—35-58 


(ii) Absolute measure of dispersion and skewness. 


Dis, : „02—01 35:58—21:56  .. 
spersion Q.D. UT PR ae =701, 
Skewness : $К= Qs-- Q1—2Med. —35:58 -21:56—(2x 26:9) 


=57'14—53'8=3'34, 
(iii) Significance of the Measures 


Quartile of the rank of x value whereof ‘is Rs. 21°56, signifies that 25% of 
the workers in the factory received wages that were less than Rs. 21:56. 
14 
Quartile of the rank of + the value whereof is Rs. 26:9, signifies that 50% of 


the workers received wages that were less than Rs. 26°9 and 50% received wages that 
Were more than this amount. 
3N 


Quartile of the rank of 4" the whereof is Rs. 35°58, signifies that 25% workers 
received more than this amount. 


, The dispersion calculated on this basis of the quartiles—Quartile deviation/ ir 
relation to median— will disclose whether or not the distribution is ‘normal’. Thus, if 
median-- Q. D. is equal to Оз, and Median —Q.D. is equal to Q;, the distribution would 

regarded as normal or symmetrical. ^ 

In this question, Median+Q.D.=(269+7°01)=33°91, and Median— Q.D-4* 
(269—7:01)—19:89, Since the figures are very different from Qa and Q; it cán be, 
inferred that the distribution is not ‘normal’. Ж 

The absolute measure of skewness, the value whereof is Rs 3°34, indicates that 
there is a larger concentration of items towards Q, i.e., the distribution is positively | 
Skewed, Such an asymmetrical distribution, when graphically represented would tend 
to tail off towards the right. 

Illustration 17. Convert the following into an ordinary frequency table and 
Obtain the values of Quartile Deviation and Coefficient of Skewness : 


Marks below : 80 70 50 40 30 20 10 


60 
No. of st. $ 190 125 95 75 60 40 25 
dmi (B. Com., Mysore, 1973) 


Solution : CALCULATION OF QUARTILE DEVIATION AND 


COEFFICIENT OF SKEWNESS 
Marks F c.f. Marks f c.f. 
ll Met op NE UNDE 3 
0—1 25 25 40—50 20 95 
10—20 15 40 Orge Fat 105 
20—30 20 60 60—70 65 190 
30—40 15 75 70—80 50 240 
Calculation of Quartile Deviation : Q.D. 25%. 


Qr=size of ih item=Size of 20-=60 item 


Qi lies in the class 20-30. 


f wae. 
Qi-Lt- Ha xi 


L=20, x =60, с./.=40, f=20, i= 10 
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01=20+ 50-40, 10=20+10=30 


Qs=Size of JN. the item=Size of 3240 


Qs lies in the class 60—70 


aay a pass: 180th item 


'3NJ4 —e. f. 
Q3=L+ “ay gametes 
L=60, 3N/4=180, c.f. 125, f=65, i510 


03=60+ mus X 10—60.8:46— 68:46 


о.р.- 055-30. 3846 9 


Calculation of Coeffi-ient of Skewness 


Since the question is calculate 


the implication is to use Bowley's method of finding out skewness, 


__Оз+01—2 Med. 
Сой. of Sk CERE Met 


Med=Size of Mth item= 20 120th item 


Median lies in‘ the class 50—60 


П 


lustration 18. (a) For a moder: 


L=50, N[2 L120, c. f.—95, f=30, 1—10 


Med=s04 120-25 19.59.1855. 8:33 


Sk= 68:464-30— 2(58:33) .93'46—116:66 
68:46—30 


38:46 


quartile deviation and coefficient of skewness 


=—0:473 


the coefficient of variati 


ately skewed data, the arithmetic mean is 100, 


Ч on is 35 and the Karl Pearson's coefficient of skewness is 0'2. 
Find the mode and the medi 


ian. (B. Com., Bombay, 1974) 


Solution: Given X100, С.У. 35, 560% 
We have to find mode and mediun 


CV=- x100 ог 35= y X100 


SU 9235 
Calculating Mode : 


sk Mode о. oza 100- Mode. 
100—mode=7 .-. Mode=93 
Calculating Median : 
Mode=3 Median—2 Mean 
93—3 Median—2x 100 
3Median-93--200—293 ., Median=97-7 
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Illustration 18. (b) Karl Pearson's coefficient of skewness of a distribution is 
1040. Its standard deviation is 8 and mean is 30. Find the mode and median of the 
distribution, я (8. Com., Delhi, 1973) 


Solution ; Given Sk=0'4,0=8, X =30 


Wa are to calcultate the values of mode and median. Subutituting the given 
values in the Karl Pearson’s method of finding out skewness, Le., Е 5 


si Х-Мобе 
g 
4... 30—Mode 
0 cee 
0:4x8=30—Mode 
30—Mode-3'2 
—Mode-3:2—30 
or Mode=26'8 
Mode=3Madien—2 mean 
26°8=3Median —2x30 
3Median—60— 26:8 
3Median-86:8 —.. Мейіап=28'93 


‚ Illustration 19. From the information given below calculate Karl Pearson's 
coefficient of skewness and also quartile coefficient of skewness 


Measure Place A Place B 
Mean 150 140 
Median 142 155 
S.D. 30 55 
Third Quartile 195 260 
First * 62 80 
Solution: Calculation of Karl Pearson's coefficient of skewness 
Xx 
Sk- Mode 
а 
X150, «30 


The value of mode is not given but can be ascertained from the relationship : 
Mode —3Median—2 Mean=3x 142— 2x 150—426—300—126 
_ 150—126 
^ 30 
Place B : 1 
X140, o=55 
Mode=3 Median—2 Mean —3—155- -2x 140— =465—280— 185 
140—185 X 

EL ome 0:82. 
Quartile Coefficient of of Skewness : 


Place А: Shem Gt Qin Met 


Q3—195, Q1—62, Med.—142 


195--62—(2x142) 257—284 aUe 
ee es = 0205 


+01—2 Med. 
Place В : si" ig ме. 


Q3=260, Q1-80, Med. 155 


Sk =+08 


Sk 
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— 260+80>(2x155) 30 a. 
Sk 260— =g +0167. 


Illustration 20. Calculate the quartile coefficient of skewness of the following 
frequency distribution : 


Weight No. of persons Weight No. of persons : 
Under 100 1 150—159 65 3 
100—109 14 160—169 31 1 
110—119 66 170—179 12 ) 
120—129 122 180—189 5 С 
130 —139 145 190—199 me 

140---149 121 200 and over 


2 
(В.А. Hons. Econ. Delhi, 1972) 
Solution : CALCULATION OF QUARTILE COEFFICIENT OF SKEWNESS 
Эрас ннан 


Weight No. of persons c.f. Weight No. of persons c.f. 


Under 100 1 1 150—159 65 534 
100—109 14 15 160—169 31 565 
110—119 81 170—179 12 577 

20—129 203 180—189 5 582 
130—139 145 348 190—199 2 584 
140—149 121 469  200and over 2 586 

te Ei NNNM 
Coeff of Sku Q3:-Q1—2 Med. 
Li 0—01 : 


Calculation of Q, : Q,=Size of Yn item= 386 L 146-5th item 
: Qı Lies in the class 120—129. But the real limit of this Class in 119°5.129°5, 
ма ~c f. 


Qi L4 Рат rt 


L=119°5)N/4= 146.5, c. f e 81, £122, 1510 


^ Qie119:54. 1465—81 X10—119:5-1.537—124:87 
нен. 


Calculation of Oy: Оу Size of A thitem= 3 X586 L439-5th item 
О» lies in the clas57140—149. But the real limit of the class is 139-5—149-5, 
зуд, 


О;= L+ EN Saris 
L7195/3N/42 43955, c. f. 348, 7101, 1—10 : 


Оз=139-5+ AES 39:361471 
Calculation of Median : Med=Size oA th items SE ооз rd item 
Median lies in the class 130—139. But real limit of this class is 129-5—130-5 ` 
М2. 
=, х 
Med, Er T 1 


11295 N72 =293, e. f.—203, /=145, i=10 


SKEWNESS, MOMENTS AND KURTOSIS · E-9:30 


Med.e129:5 4-297299 х 10119. 5162—1357 
01-12487, Q5 147-1, Med. 13577 


14714-124:87—2(357) 271972714 0:57... 
Coeff of Sk= -uy|-pas] — 7 2223. 72233 710026 


Illustration 21. Calculate quartile deviation and a measure of skewness of the 
distribution and comment on the result, 


Expenditure on No. of families Expenditure on No. of families 


Food (Rs.) of factory Food (Rs.) of factory 

employees employees 
55'5—57 5 сло, 65'5—67:5 20 
57:5—59:5 4 675—695 9 
59°5—61°5 9 69°5—71°5 2 
61:5—63:5 30 715-735 1 
63:5—65:5 23 


(B.A. Hons. Econ. Delhi; 1973) 
Solution 


CALCULATION OF QUARTILE DEVIATION AND COEFFICIENT OF 
SKEWNESS 


Expenditure on No. of families 


Food (Rs.) f . c.f. 
555—575 2 2 
57:5 595 4 H 
39:561 5 9 15 
615—635 30 45 
63°5—65'5 23 68 
65-5—67-5 20 88 
675—695 9 97 
6»5—71-5 2 99 
115—735 1 100 
N=100 
ор. - 2-0. 
100 
Qi=Size of Xin item= bay =25th item 
Qi Lies in the class 61°5—63'5, 
N/4 —c f. 
Qi LT I xi 
1,=61°5, N]4=25, c.f.=15, f=30, i=2 
i 01=61:5 + E X2—61:54-6726217 
3 z 
Qa=Size of иһ item=Size of 719 7st item 
Qs Lies in the class 65°5—67°5. 
3NJ4 -c.f. 


Qy-L—;— xt 
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ji L=65°5,3N/4 =75, c.f. 68, f=20, im2 
Д 


Qi-6554 FOS ness i Te 662 


Q.D.— $52 6217 _ #0; 
Since we were asked to calculate quartile deviation and coefficient of skewness, 
by implication we should use Bowley’s method of finding skewness. 


_ 24+ 01-2 Med. 
Coeff of Sk Q-Q; 


Med.=Size of ^ui iteme 100 50th item 


Median lies in the class 63:5—65:5. 
Ni2'—c.f. 
Med.=L+—= 7 xi 


L=63'5,Nj2=50, c.f. —45, f=23, i<2 


Med.=63'54 755 065104356393 
02-662, 01-62-17, Med. 63:93 
т 66°24 62'17—2(63°93) 051 . 
‘i Es eon 4103—0127 
Illustration 24. 


(a) The standard deviation of symmetrical distribution is 3. 
What must be the value of 


value of the fourth moment about the mean in order that the distri- 
bution be mesokurtic ? 


(6) If the first four moments of distributi 
22, —117 and 560, determine the corresponding 


(i) about the mean, and 
(ii) about zero. 


on about the value Sareequal to —4, 
moments : 


(M. Com., Delhi, 1972) 
Solution. .(a) For a mesokurtic distribution 8—3 


[5 

h-u 
We are given 0e3 

Bi—o5—(3)—9, B,—3, p=9 
S 3-53. Or 24243. 


Thus the 
be mesokurtic. 


(b) We are given moments about an arbitary origin 5, 
Thus  a'——4 — g,——22, Ba7-—11017, — n,'—560. 
Moments about Mean: 


fourth moment about mean must be 243 in order that the distribution 


a From these we can find out moments about mean from the following relation- 
ships: 


Bae — (1)? 
Hac Pa Зву HAr)? 
PAS Pa —4 ра брав) (8174 
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Substituting the values, 
ку=22—(—4)*—=22—16=6 
m=- 11-30-4722): 207 38 —117+264—128=19 
pae 560 - 4(—4)—117+6(— 22) — 499 3f : 
—560—1872--2112—768—32 


Thus the moments about mean are p1 0, #2=6, $3—14 and 0432. 


Moments about Zero : 
Let the moments about zero be denoted by v1; 72, Уз etc. 
The first moments about zero, i.e., уу” А+ Р of mean 
The second s- » " ie, уе ва +010) 
Thethid , « » ie., vais 3j — 205 
Thefourth, onc ids cob у= +4уз En e 3714 
Substituting these values, у= 5+с—4)=1 
ya 764.0 77 
уз=19 +ЗПЛ)—2(1#=19+21—2=38 
зат a24 (438) —6()EET) + Ua 32+ 152-424 3A 
The moments about zero are 
n=l, vg=7, y3—38, у4=145. 


Illustration 25. Calculate the first four moments of the follow! 
about x=40°$ and thence find out the moments about the mean : 


ing distributior 


Hours worked No. of Hours worked No.of 
Industries Industries 
30:0—32:9 2 390—419 47 
330—359 4. 420—449 15 
360—389 26 45:0—4T9 6 


Ў (T.D.C. Raj., 197: 
Sólutlon: CALCULATING MOMENTS ABOUT ARBITARY ORIGIN 


SR ar A 


Hours worked f m.p. nes E 
dU EE apo ie o TOE ме Л t 
300—329 2 31:45 EE TET 
350-353 4 3445 E53 ages WEG, JM ү 
Wow) зета Eee op b Oe TR 
21315. fir di р р ote 15 3 
-44 34 1 Eres 
450—479 6 46°45 2 2 24 48 96 
N=100 ya o id^ xj» 234. 


Moments about arbitary origin 49'5. N 


ap "coetu , 
Lo eq xC= -туу x3=-0'39 
‚_ 3fd* 


в xC? =p 9759! 


8 == 
„== xc: — x27-1523 


3/4% 363 є 
меу xc* —-00 x81—294:03 
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Moments about mean 
ь=0 
a= Ba’ —(#1’)? =8'91—(—0°39)?=8 91 —0'15=8:76 i 
юз=Рз'—3Ру'Ра'--2һ1% 
=—13°23—3х(—0°39)(8°91)+ (—0-39)%=—13°23+10°42—0:118== —2:928 
Wa 84/4) Pa 60! ра —3 ey t 
&2294:03—4(—0:39)( —13:23)4-6( —0:39)*(8:91) —3(—0:39)1 
—294:03— 20*54—8'1— 0:069 —265:23 


Mlustration 24. You are given the following values of moments : 
#9=43°353, вз=—9"774, в = 5508567. 


Find the corrected values of each one of these, taking into account the class 
mterval which is 3. (M. Com., Raj., 1973) 


Solution : & (corrected) з (uncorrected) — 1 


Gu. : Я 
=43°353—-ту- —43:353— 07750 —42:603 


a (corrected) = v, (uncorrected)— + i# ва (uncorrected) + m. 


=5508'567— Lararasy+ 195 =5508°567—145°0885 +2°3625=5365'841 
Iilustration26, The following data are given to ап economist for the purpose 


оѓ economic analysis. The data refer to the length of life of a sample of Good Year 
‘Tyres, Do you think that the distribution is platykurtic ? 


N=100, Xfd,—50, 2/4, 19672, 1/4 3—2925:8, 2f, 4-866502 
т = vs (gos (M. Com., Delhi, 1971) 


Solution. In order to ascertain whether the distribution is'playkurtic or not we 
thave to calculate the value of Be. 


Ва 6. 
Ма 250 igs, par Me 19672 _ 10, 
B= т) 05: WT pg = 19672 
Siig? 2l 29238 „20, a] 5/44 = 866502... 
mm iE m p 729238; wo 7 Tog - =866'502 


рз =Р!— (01) 19:672—(0:5)319:672—0:25— 19:422 

Bac h3' Зву na 4273. 29"258—300:5)(19-672)-+2(0:5)3 
=29°258—29'5080--0-250 = 29:508—29:508—0 

тщ=р/'—4ву'!Рз/'+6бёз/'(®')#—3(в')% 
7866:502—4(:5)(29 258) -7(19:672)(05)? —3(0:5)* 
—866:502 — 58:516-4-29508—0:1875 —837:3065 

h- (84008 Soa Т?” 

Since the value of B, is less than, 3 the distribution is platykurtic, 
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lilustration27, Examine whether the following results of. a piece of computa- 
tion for obtaining the second (central) moment are consistent or not: 
N50, 1d —62, 2d* —64 
Solution. Second (central) moment is ва ог o? 


ха ( уа v. 64 -62Y ,. : 7 
ee = (34) -% ( =128-154—=—26 


50 


b „But second moment about the mean can never to negative, Hence there is some 
i nconsistency in the information given. 


х 
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Meaning 

So far we have studied problems relating to one variable only. In 
practice we come across a large number of problems involving the use of 
two or more than two variables. Iftwo quantities vary in such a way 
that movements in one are accompanied by movements in the other, these 
quantities are correlated. For example, there exists some relationship 
between age of husband and age of wife, price of a commodity and 
amount demanded, increase in rainfall up to а point and production of 
тїсе, ап increase in the number of television licences and number of 
cinema goers, etc. The degree of relationship between the variables 
under. consideration is measured through the correlation analysis. The 
measure of correlation called the correlation coefficient or correlation index 
summarizes in one figure the direction and degree of correlation, The 
correlation analysis refers to the techniques used in measuring the closeness 
of the relationship between the variables. A very simple definition of 
correlation is that given by A.M. Tuttle. He defines correlation as: “An 
analysis of the covariation of two or more variables is usually called 
correlation." 

The problem of analysing the relation between different series should 
be broken down into three steps : 

(1) Determining whether a relation exists and, if it does, measur- 
ing it. 

(2) Testing whether it is significant. 

: (8) Establishing the cause and effect relation, if any. 


In this chapter only the first aspect will be discussed. For second 
aspect a reference may be made to chapter on Tests of Significance. The 
third aspect in the analysis, that of establishing the cause-effect relation, is 
difficult to be treated statistically. An extremely high and significant 
correlation between the increase in smoking and increase in lung cancer 
would not prove that smoking causes lung cancer. The proof of a cause 
and effect relation can be developed only by means of an exhaustive study 
of the operative elements themselves. 

It should be noted that the detection and analysis of correlation 
(i.e., covariation) between two statistical variables requires relationship of 
some sort which associates the observations in pairs, one of each pair 
being a values of each ofthe two variables In general the pairing re- 
lationship may be of almost any nature, such as observations at the same 
time or place or over a period of time or different places. 


'The computation concerning the degree of closeness is based on the 
regression equation. However, it is possible to perform correlation 
analysis without actually having a regression equation. 


*'When the relationship is of a quantitative nature, the appropriate statistical 
tool for discovering and measuring the relationship and expressing it in a brief formula 


is known as correlation.” 
—Croxton and Cowden ; Applied General Staristics. 


ON 
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Utility of the Study of Correlation 


The study of correlation is of immense use in practical life because 
of the following reasons : 3 


1. Most of the. variables show some kind of relationship. For 
example, there is relationship between price and supply, income and expen- 
diture, etc. With the help of correlation analysis we can measure in one 
figure the degree of relationship existing between the variables, 


2. Once we know that two variables are closely related, we can 
estimate the value of one variable given the value of another. This is done 
with the help of regression analysis discussed in the next chapter. 


3. Correlation analysis contributes to the economic behaviour, aids 
in locating the critically important variables on which others depend, may 
reveal to the economist the connection by which disturbances spread and 
suggest to him the paths through which stabilizing forces become 
effective, 


In business, correlation analysis enables the executive to estimate 
costs, sales, prices and other variables on the basis of some other series with 
which these costs, sales, or prices may be functionally related. Some of 
the guesswork can be removed from decisions when the relationship be- 
tween a variable to be estimated and the one or more other variables on 
which. it depends are close and reasonably invariant. 


However, it should be noted that coefficient of correlation is one of 
the most widely used and also one of the most widely abused of statistical 
measures. It is abused in the sense that one sometimes overlooks. the fact 
(Баг correlation measures nothing but the strength of linear relationships 
and that it does not necessarily imply a cause-effect relationship. 


4, Progressive development in the methods of science and phi- 
losophy has been. characterized by increase in the knowledge of relation- 
ships or correlations. In nature also one finds multiplicity of interrelated 
forces. 


Correlation and Causation 


Correlation analysis helps us in determining the degree of relation- 
ship between two or more variables—it does not tell us anything about 
cause and effect relationship, Even a high degree of correlation docs not 
песеѕѕа: Пу mean that a relationship of cause and effect exists between the 
variables or, simply stated, correlation does not necessarily imply causation 
or functional relationship though the existense of causation always implies 
correlation. By itself it establishes only covariation. The explanation 
of a significant degree of ‘correlation may be any one, or a combination of 
the following reasons : 


1. The correlation may be due to pure chance, especially in a small 


sample. We may get а high degree of correlation be: ween two variables 
in a.sample but in the universe there may not be any relationship between 
the variables at all. This is especially so in case of small samples. Sucha 
correlation may arise either because of pure random sampling variation 
or because of the bias of the investigator in selecting the sample. The 
following example shall illustrate the point : 
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Income (Rs.) : 500 600 700 800 900 
Weight (1Ы.) : 120 140 160 180 200 


The above data show a perfect positive relationship between income 
and weight, i.e. as the income is increasing the weight is increasing and 
the rate of change between two variables is the same. 


2. Both the correlated variables may b» influenced by one or more 
other variables. It is just possible that a high degree of correlation between 
the variables may be due to the same causes affecting each variable or 
different causes affecting each with the same effect. For example, a high 
degree of correlation between the yield per acre of rice and tea may be due 
to the fact that both are related to the amount of rainfall. But none of the 
two variables is the cause of the other. t 


3. Both the variables may be mutually influencing each other so that 
neither can be designated as the cause and the other the effect. There may 
be a high degree of correlation between the variables but it may be difficult 
to pinpoint as to which is the cause and which is the effect. This is espe- 
cially likely to be so in case of economic variables. For example, such 
variables as demand and supply. price and production, etc., mutually: 
interact. To take a specific case, itis a well known principle of economics 
that as the price of a commodity increases its demand goes down and so 
price зі the cause and demand the effect. But it is also possible that 
increased demand of a commodity due to growth of population or other 
reasons may exercise an upward pressure on price. Now the cause is the 
increased demand, the effect the price. Thus at times it may become 
difficult to explain from the two correlated variables which is the cause and 
which is the effect because both may be reacting on each other. 


The above points clearly bring out the fact that a mathematical 
relationship implies nothing in itself about cause and effect. In general 
if factors A and B are correlated, it may be that (i) A causes B to be sure 
but it might also be that (2) B causes A (3) A, and B influence each other 
continuously or intermittently (4) А and B are both influenced by C or (5) 
the correlation is due to chance. In many instances extremely high degree 
of correlation between two variables may be obtained when no meaning can 
be attached to the answer. There is, for example, extremely high correl- 
ation between some series representing the production of pigs and the 
production of pig iron, yet no one has ever believed that this correla- 
tion has any meaning or that it indicates the existence of a cause- 
effect relation. By itself, it establishes only covariation. Correlation 
observed between variables that cannot conceivably be causally related 
is called spurious or nonsense correlations. More appropriately we should. 
remember that it is the interpretation of the degree of correlation that is 
spurious, not the degree or correlation itself. The high degree of correla- 
tion indicates only the mathematical result. We should reach a conclusion: 
based on logical reasoning and intelligent investigation on significantly 
related matters, It may also be pointed out that errors in correlation ana- 
lysis include not only reading causation into spurious correlation but also" 
interpreting spuriously a perfectly valid relationship. 
Types of Correlation 


Correlation is described or classified in several different ways. Three: 
of the most important ways of classifying correlation are : j 
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(i) Positive or negative, 

(ii) Simple, partial and multiple, 

(èii) Linear and non-linear. 

(i) Positive and Negative Correlation. Whether correlation is 
positive (direct) or negative (inverse) would depend upon the direction of 
change of the variables. If both the variables are varying in the same 
direction i.e., if as one variable is increasing the other on an average is also 
increasing or, if as one variable is decreasing the other on an average 
is also decreasing, correlation is said to be positive. If, on the other hand, 
the variables are varying in opposite directions, i.e., as one variable is 
increasing the other is decreasing or vice versa, correlation is said to be 
negative. The following examples would illustrate ‘the difference between 
positive and negative correlation 
I POSITIVE'CORRELATION 


X: 10 12 15 18 20. X: 80 70 60 40 30 
Y: N 20 22 25 37 Y: 50 45 30 20 10 

П NEGATIVE CORRELATION 
X: 20 30 40 60 80 X: 100 90 60 40 30 
Y: 40 30 22 15 10 Y: 10 20 3 40 50 


(ii) Simple, Partial and Multiple Correlation*. The dis- 
tinction between simple, partial and multiple correlation is based 
upon the number of variables studied. When only two variables are 
studied it is a problem of simple correlation, When three or more 
variables are studied it is a problem of either multiple or partial 
correlation. In multiple correlation three or more variables are 
studied simultaneously. For example, when we study the relation- 
ship between the yield of rice per acre and both the amount of rain- 
fall and the amount of fertilizers used, it is a problem of multiple 
correlation. On the other hand, in partial correlation we recognise 
more than two variables, but consider only two variables to be 
influencing each other, the effect of other influencing variables being 
kept сопѕі" 1. For example, in the rice problem taken above if we 
limit our correlation analysis of yield and rainfall to periods when a 
certain average daily temperature existed, it becomes a problem of 
partial correlation. In this chapter we shall study problems relating 
to simple correlation only. 

(iii) Linear and Non-linear (Curvilinear) Correlation, The 
distinction between linear and non-linear correlation is based upon the 
constancy of the ratio of change between the variables. If the amount of 
change in one variable tends to bear a constantratio to the amount of 
change in the other variable then the correlation is said to be linear. For 
example, observe the following two variables X and Y : 

X: 10 20 30 40 50 
qu 70 140 210 280 350 

It is clear that the ratio of change between the two variables is the 
same. If such variables are plotted оп a graph paper all the plotud 
points would fall on a straight line. 


* For a detailed discussion[of Partial and Multiple Correlation Analysis please 
| refer to Chapter 9, Vol. II. 
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Correlation would be called non-linear or curvilinear if the amount 
of change in one variable does not bear a constant ratio to the amount of 
change in the other variable. For example, if we double the amount of 
rainfall the production of rice or wheat, etc., would not necessarily be 
doubled. It may be pointed out that in most of the practical situations we 
find a non-linear relationship between the variables. However, since techni- 
ques of analysis for measuring non-linear correlation are far more compli- 
cated than those for linear correlation, we generally make an assumption 
that the relationship between the variables is of the linear type. 

'Thefollowing two diagrams will illustrate the difference between 
linear and curvilinear correlation : 


POSITIVE LINEAR CORRELATION CURVILINEAR CORRELATION 


METHODS OF STUDYING CORRELATION 


The various methods of ascertaining whether two variables are corre- 
lated or not are : : 
I. Scatter Diagram Method 
II. Graphic Method 
III. Karl Pearson's Coefficient of Correlation 
IV. Rank Method 
V. Concurrent Deviation Method 
VI. Method ef Least Squares.* 
Of these, the first two are based on the knowledge of diagrams and 
graphs whereas the others are the mathematical methods. Each of these 
methods shall be discussed in detail in the following pages. 


1. SCATTER DIAGRAM METHOD 
The simplest device for ascertaining whether two variables are rela- 


. ted is to prepare a dot chart called scatter diagram.] When this method 


is used the given data are plotted on a graph paper in the form of dots, 
i.e., for each pair of X and Y values we put’ a dot: and thus obtain as 
many points as the number of observations. By looking to the scatter of 
the various points we can form an idea as to whether the variables are 


* The method is discussed in details in Chapter 11 entitled "Regression 
Analysis'. 


+ The method is so called because it indicates the scatter of the various points. 
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related or not. The greater the scatter of the plotted points on the chart, the 
lesser is the relationship between the two variables. The more closely the 
points come to a straight line, the higher the degree of relationship. If 
all the points lie on a straight line falling from the lower left-hand corner 
to the upper right-hand corner; correlation is said to be perfectly positive 
(i.e., r=+ 1) (diagram I). On the other hand, if all the points are lying 
on astraightline rising from the upper left-hand corner to the lower 
right-hand corner of the diagram, correlation is said to be perfectly nega- 


ёр СТ POSITIVE CORRELATION PERFECT NEGATIVE CORRELATION | 
аке Rd) Hen (==!) 


ал} 
I I 

tive (i.e., = —1) (diagram II). If the plotted points (fall in a narrow 
band there would be a high degree of correlation between the variables— 
correlation shall be positive if the points show a rising tendency from the 
lower left-hand corner to the upper right-hand corner (diagram III) and 
negative if the points show a declining tendency from the upper left-hand 
corner to the lower right-hand corner of the diagram (diagram IV), On 


HI REE OF HIGH DEGREE OF 
POSITIVE CORRELATION NEGATIVE CORRELATION _ 


II IV 


‘the other hand, if the points are widely scattered over the diagram it is 
the indication of very little relationship between the variables—correlation 
shall be positive if the points are rising from the lower left-hand corner to 
the upper right-hand corner (diagram V) and negative if the points are 


— íü 
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running from the upper left-hand side to the lower right-hand side of the 
diagram (diagram VI). If the plotted points lie on a straight line parallel 
to the z-axis, or in a haphazard manner it shows absence of any relation- 
ship between the variables (i.e., r—0) as shown by diagram VII. 


LOW DEGREE OF LOW DEGREE ОЕ 
POSITIVE CORRELATION. NEGATIVE CORRELATION. 


Illustration 1. Given the fo. wing pairsof value Бе variable X and Y : 
x 2 J 5 6 8 9 


X 6 5 7 8 12 11 
(a) Make a scatter diagram. 


(b) Do you think that there is any correlation. between the variables X and Yk 


Is it positive or negative ? Is it high or low ? 
(c) By graphic inspection, draw an estimating line.* (B. Com., Delhi, 1969) 


For 


.. * An estimating line ог regression line isa line of average relationship. 
details please see next Chapter on ‘Regression Aralysis'. 
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Solution. By looking at the scatter diagram we can say that the variables X 
and Y are correlated, Further, correlation is positive because the trend of the points 
is upward rising from the lower left-hand corner to the upper right-hand corner of the 
diagram. The diagram also indicates that the degree of relationship is high because 

| the plotted points are near to the line which shows perfect relationship between the 


variables. 


SCATTER DIAGRAM 


Merits and Limitations of the Method 


Merits. 1, lItisasimple and non-mathematical method of study- 
ing correlation between the variables. “As such it can be easily under- 
stood and a rough idea can very quickly be formed as to whether or not 
the variables are related. 


2. It is not influenced by the size of extreme items whereas most 


ofthe mathematical methods of finding correlation are influenced by 
extreme items. 


3. Making a scatter diagram usually is the first step in investigating 
the relationship between two variables. 


Limitations. By applying this method we can get an idea about 
the direction of correlation and also whether it is high or low. But we 
cannot establish the exact degree of correlation between the variables as is 
possible by applying the mathematical methods. 


П. GRAPHIC METHOD 


When this method is used the individual values of the two variables 

are plotted on the graph paper. We thus obtain two curves, one for 

‚ variable and another for Y variable. By examining the direction an 
closeness of the two curves so drawn we can infer whether or not th? vari- 
ables are related. If both the curves drawn on the graph are moving in 
the same direction (either upward or downward) correlation is said to be 
positive. On the other hand, ifthe curves are moving in the opposite . 
directions correlation is said to be negative. The following example shall 
illustrate the method : 


> Tilustration 2. From the following data ascertain. whether the income and ex- 
penditure of the 100 workers of a factory are correlated : 
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Average Average Average Average 
Year income expenditure Year income expenditure 

(n Rs) (in Rs.) (їп Rs.) (in Rs.) 
1957 100 90 1962 112 94 
1958 102 91 1963 118 100 
1959 105 93 1964 120 106 
1960 105 95 1965 125 108 
1961 108 92 1966 130 110 


| 
D iacet J 
057 1958 1959 1950 1961 907 969 i64 P65 We 
YEARS 


The graph shows that the variable, income and expenditure are closely related. 


This method is normally used where we are given data over a period 

_ of time, i.e., in case of time series. However, as with the scatter diagram 

method, in this method also we cannot get a. numerical value describing 
the extent to which the variables are related. 


Ш. KARL PEARSON'S COEFFICIENT OF CORRELATION 


Of the several mathematical methods of measuring correlation, the 
Karl Pearson's method, popularly known аз Pearsonian coefficient of cor- 
relation, is most widely used in practice. The Pearsonian coefficient © 
correlation is denoted by the symbol r. It is one of the very few symbols 
that are used universally for describing the degree of correlation between 
two series. The formula for computing Pearsonian r is : 

Zay ys 
PING (t) 
Here z-(X—X) ; y-(Y—Y) 
c ,— Standard deviation of series X 
с, —Standard deviation of series Y 
N=Number of pairs of observations. 
r= the (product moment) correlation coefficient. 
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"This method is to be applied only where the deviations of items are 
taken from actual means and љоѓ from assumed means. 


The value of the coefficient of correlation as obtained by the above 
formula shall always lie between +1. When r—-kl, it means there is 
perfect positive correlation between the variables. When r — —1, it means 
there is perfect negative correlation between the variables. When r=0, 
it means there is no relationship between the two variables. However, in 
practice such values of r аѕ +1, —1, апа 0 are rare. We normally get 
values, which lie between +1 and —1 such as +0°8, —0'4, etc. The coeffi- 
cient of correlation describes not only the magnitude of correlation but alsó 
its direction, Thus, +0'8 would mean that correlation is positive because 
the sign of r is + and the magnitude of correlation is 0:8. Similarly —0*4 
means correlation is negative. 

The above formula for computing Pearsonian coefficient of correl- 
ation can be transformed to the following form which is easier to apply : 


Zay " 
*= -e "t 

= ттуу e 
where z—(X —X) and у=(7—7) 


It is obvious that while applying this formula we have not to calcu- 


late separately the standard deviations of X and Y series as is required 
by formula (i). This simplifies greatly the task of calculating correlation 
coefficient. 

Steps. (i) Take the deviation of X series from the mean of X and 
denote these deviation by 2. 

(ii) Square these deviations and obtain the total, i.e., Er’, 

(iii) Take the deviations of Y series from the mean of Y and denote 
these deviations by y- 

(iv) Square these deviations and obtain the total, i.e., Zy? 
4 х) Multiply the deviation of X and Y series and obtain the total, 
i.e., Say. 

(vi) Substitute the values of Zzy, Ха? and Ху? in the above formula. 

The following examples will illustrate the procedure : 


*The coefficient of correlation is said to be a measure of covariance between two 
series, The covariance of two series X and Y is written as : 


" Уху 
Covariance— yy 7 
where x and y stand for deviations of X and Y series from their respective means. 


In order to find out the value of correlation coefficient first we calculate 
covariance and then in order to convert it to a relative measure we, divide the covariance 
by the standard deviation of the two series. The ratio so obtained is called Karl 


Pearson's correlation coefficient. 
з Ху? 
сш= A , eA T 
Lem 
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Illustration 3. Calculate the coefficient of correlation from the following data : 


x 9 8 7 6 5 4 3 2 1 
Y 15 16 14 13 11 12 10 8 9 


(B. Com, Punjabi University, 1975) 
Solution: CALCULATION OF COEFFICIENT OF CORRELATION 


au Ru Y уш 

x х? У y xy 

rt NS 

9 +4 16 15 +3 9 12 
8 +3 9 16 +4 16 12 
7 +2 4 14 +2 4 4 
6 +1 1 13 +1 1 1 
5 0 0 11 -1 1 0 
4 -1 1 12 0 0 0 
3 -2 4 10 —2 4 4 
2 —3 9 8 —4 16 12 
1 -4 16 9 23 9 12 


IiX-45 Ex=0 д=60 EY=108 Xy-0 Zy?=60 Iixy-57 


кейш 
У Ex? xX Ey? 
nad ›=(Ү—7) 
108 


-5 =$ ; Tao 


E. Rape FM 


t oz Lidl й; 
= -0'95 
"70x00 60 
Tilustration 4. Making use of the data summarized below, calculate the 
coefficient of correlation, rig : 

Case xX Xs Case Xi Xs 
A 10 9 B 12 11 
в 6 4 Е 13 13 
c 9 6 G 11 8 
р 10 9 н 9 4 


(B. Com., Delhi, 1969) 
Solution: CALCULATION OF COEFFICIENT OF CORRELATION 


Cae Хр = Q-X) Xi ^^ (05-X) 
: x x? Xa X92 xixa 
A 10 0 0 9 +1 1 0 
в 6 —4 16 4 -4 16 16 
C 9 -1 1 6 -2 4 2 
D 10 0 0 9 +1 1 0 
E 12 2 4 11 +3 9 6 
F 13 a 9 13 +5 25 15 
G 11 1 1 8 0 0 0 
H 9 -1 1 4 —4 16 4 


N-8  IX,-80 Ех=0 Exy2=32 EX,—64 — Xu-0 Ех'—72 Ехух;=43 


$343.80 (y. y XX. 64 
Aa 10; а= "i e.a 


——————————ÓKJefJ,K ——————-————— 55520 
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mm Үхух 
17 Ex! X Exe! 
xx; 043, 1x! 732, Ух;2=72 2 
N 43 43 43 
Substituting the values *ri—2—27./53 . —--0:896 
a= gax Ау 2509 48 у 
Note. It may be noted that the above formula is the same as given 
earlier, i.e., 
Уху 
re ————- 
af Ext x Xy* 
The only difference is that of the symbols. Since in this question we were given 
series Ху and Xa we changed the symbols :n the formula accordingly. 
Calculation of Correlation Coefficient when Change of Scale and 
Origin is made \ 
Since 7 is a pure number, shifting the origin and changing the scale 


of series do not affect its value. , 
]ilustration 5. Calculate coefficient of correlation from the following data : 


x 100 200 300 400 500 600 700 
Y 30 50 60 80 100 110 130 
Solution : COMPUTATION OF CORRELATION COEFFICIENT 


a rar 


x q-X) dX- 01) 

х x Xo G3. bey у? ху 
100  —300 ES gue m0, А 15 ] 
200 —200 nro eso ЖЕЕ жы ШЫ 6 
300 —100 Hoy QM а. 4 2 
400 0 бы Ө er 80 ouo «0 0° 
500 +100 1 1 100 +20 2 4 2 
600 +200 Ped eve Th) жару леа 6 
700 +300 Pone S30 E у. {2 15 

XX-2800 $xc0 22—28 ZY=560 3y—0 21-76 2ху=46 
Уху 


МР ESXY 


x - 209 0; Т - 29 -в0 


уху=46, 2x? 28, 201276 
46 46 
Qi I AZ = +0°997. 
r= x16 = 4613 
It should be noted that had we not taken a common factor the calculations 
would have been much more than what we find above. 


When Deviations are taken from an Assumed Mean 


When actual means are in fractions, say the actual means of X and Y 
series are 20:167 and 29723, the calculation of correlation by the method 
discussed above would involve too ‘many calculations and would take a lot 
of time. In such cases we make use of the assumed mean method for 


——— 


< егуз means coefficient of correlation between series Хз and Xs. 
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finding out correlation. When deviations are taken from an assumed 
mean the following formula is applicable : 


Bia, ТЕЗ 


(Z4 |. (2a, 
Ee ORE Tu mgr 


Where d, refers to deviations of X series from an assumed mean, t.e., 


(X54). 


Similarly d, refers to deviations of Y series from an assumed mean, 


$.e., (Y— A). 


2d,dy=sum of the product of the deviations of X and Y series from 
their assumed means. 


т= 


24,3==зитп of the squares of the deviations of X series from an 
assumed mean. 


2dy*=sum of the squares of the deviations of Y series from an assumed 
mean. 


Zd,—sum of the deviations of X series from an assumed mean. 
Zdy- sun of the deviations of Y series from an assumed mean. 


It may be pointed out that there are many variations of the above 
formula. For example, the above formula may we written as : 


E NZd,d,— ((Edj) x (Zd,)) 
V NZdj — (Zd, NZdj—(Zd,j 
But the form given above is the easiest to apply. 


r 


"Steps. (i) Take the deviations of X series from an assumed mean 
and denote these deviations by d, and obtain the total, i.e., dz. 


(ii) Take the deviations of Y series from an assumed mean and 
denote these deviations by d, and obtain the total, i.e., Zd,. 


(iii) Square d, and obtain the total >а. 
(iv) Square d, and obtain the total Zdy. 
(v) Multiply d, with dy and obtain the total Zd,d,. 


(vi) Substitute the values of Zd,d,, Zd,, Udy, Zd,?and Xd? in the 
formula given above. 


The following examples shall illustrate the procedure : 


Illustration 6. Calculate Karl Pearson's coefficient of correlation between the 
values of X and Y given below : 


x 78 89 99 60 39 79 68 61 


Y 125 137 156 12 107 136 123 108 
(M.A. Econ. Meerut, 1975) 
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Solution: CALCULATION OF KARL PEARSON'S CORRELATION 


COEFFICIENT 

T (X—70) (Y—128) NEIIT 
dz d, Y dy dy: dady 
78 48 64 125 —3 9 — 24 
.89 +19 361 137 +9 81 +17! 
99 . +29 841 156 +28 784 +812 
60 —10 100 112 —16 256 +160 
59 —11 121 107 —21 441 +231 
19 +9 $81 , 136 +8 64 + 72 
68 — 2 4 123 —5 25 + 10 
61 — 9 81 108 —20 400 +180 
BX=591 1d4,-33. Yd) gY-1004.Xd,——20 Edy2=2060  XYd,dy 
=1653 ' =1612 

Ed, x Edy 

Eddy — > 

dad; N 


== 


; XE Jo... GAS 
4 zd- £ =: NE DM 


Ededy=1612, 1d, —33, 1d, — —20, N=8, 20 —1653, Xd, 2060 


33x —20 


1612— — 7$ 


pe р ҖЕН —— 
4 1653— ep 2060— 207. 
ЮУ | 2 16945 — 
утте 25у 2000—50 — V 316815 V 2010 
Log re Log 1694:5 (108 1516:875--Log 2010) 
3229240 1808-3:3032) 2 3:2292— 464840) 
—32292-32420— 1:9872 
r—AL 1'9872=+0'971 
Illustration 7. "Nine students obtained the following percentage of marks in the 


College Test (X) and in the Final 'University Examination (Y). Calculate the correla- 
tion coefficient and find its probable error. : 
x 51 6 73 46 50 60 47 36 60 
Y 49 72 74 44 58 66 50 30 55 
(LCM, A., 1973) 
Solution ; COMPUTATION OF COEFFICIENT OF CORRELATION 
(X—54) e d (Y—55) 


x ds dy Y dy dy daily 
| 51 = '8 9 E —6 36 18 
63 +9 8i 72 +17 289 153 
73 +19 361 74 +19 361 361 
| 46 28 “4 4 ES 121 8 
| 50 —4 16 58 +3 9 =12 
| 60 +6 a6 66 +11 121 66 
| 47 —.7 49 50 — 5 25 35 
36 —18 324 30 —25 625 450 
60 +6 36 55 0 o 0 


3d, 7-3 Edy?=1, 587 ®4й,=1,159 


li 
© 
[| 
о 
5 

" 
M 
3 
E 
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Ededy— 


А 
Е ———— Усан Зи 
i Xd 
4 Xdj— с "I НЕ; m 
Zd,d,—1,159, Zd,—0, d, —3, 2d,?—976, Edy2=1,587, N=9 
(0)(3) 
1159— ЕЗЕТ) 


(Zd;) x (Edy) 
N 


ус сы бск: 


———— 
(0) 7.0 

Bie 1,58; 9 

ito 1159 

-VIV 1,586 


Taking Logarithms 
Log r=log 1159—3 (log 976+log 1586)=3-0638—} (2°9894-+-3°2006) 
=3°0638—} (6° 1898)=3'0638—3°0949= 1/9689 
r=AL 1:9689—0:931 
Probable error of r is obtained by the following formula : 


ES 
P.E.,—0:6745 25 
N 


г=0'932, N=9 


227 1—(0:9323? í 
4 Р.В,=0:6745 0292. _o-6745 x 9131 eps 
Illustration 8. The following table gives the soil temperature and the germina- 


gon time at various places. Calculate the correlation coefficient and interpret its 
value, 


Temperature Germination Temperature Germination 
Time Time 
57 10 42 27 
42 26 44 19 
40 30 40 18 
38 41 46 19 
42 29 44 31 
45 27 43 29 
Take 44 and 26 as assumed mean. (B. Com., Lucknow, 1973) 


Solution. Let temperature be denoted by X and-germination time by Y. 


CALCULATION OF CORRELATION COEFFICIENT TAKING 
DEVIATIONS OF X AND Y FROM 44 AND 26 RESPECTIVELY 


x (X—44) Y (Y—26) 

d; d, dy dy? dzdy 
57 +13 169 10 —16 256 —208 
42 = 2 4 26 0 0 
40 = 4 16 30 +4 16 —16 
38 =6 36 4l +15 225 —90 
42 —2 4 29 +3 9 — 6 
45 1 1 27 T 1 +1 
42 —2 4 27 +1 1 —2 
44 0 0 19 — 7 49 0 
40 —4 16 18 —8 64 32 
46 +2 4 19 — 7 49 —14 
44 0 0 31 +5 25 0 
43 —1 1 29 +3 9 —3 


М=12 dgm—5 42—255 Xdj,-—6  Idj—704 dade 
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а Ed, x Уа, 
ENDE Xá,dy— EE 
B [ (d, 
NI gaan en N zda- ER 
Xd,dy— —306, 3d,=—5, ¥dy=—6, Zd,2-255, Edy2=704, N—12 
(—5)(—6) 
А БУШ? 
na Г EE ( F 
KEEN, IC. 

255 = 104— y 
INEST E 8 
=/255—21//'1704—3 \М/2579у 701 

Taking Logarithms 


log r—log 308°5—[1ов 252°9-+ log 701]—2:4893— 1(274029 4-2:8457] 
—2:4893—1[5:2486]—2:4893 —2:6243— 18650 
r=AL 1°865=0°733 
Hence r=—0'733. 
_ Thus there is a high degree of negative correlation between temperature and 
germination time. 
Illustration 9. The following table gives the distribution of the total population ` 


and those who are wholly or partially blind among them. Find out if there is any 
relation between age and blindness. 


Age No. of persons Blind Age No. of persons Blind. 
(їп thousands) (in thousands) 

0—10 100 55 40—50 24 36. 
10—20 60 40 50—60 Hi RAN EE 
20—30 40 40 60—70 6 18 
30—40 36 40 70—t0 3 15 


( B. Com., Pun jab, 1972 ; B. Com., Madurai, 1973). 
Solution. For facilitating comparison we must determine the number of blinds 
in terms of a common denominator, say, 1 lakh, The first figure would remain as it is 
because 55 persons are blind out of 100 thousand, i.e., 1 lakh, The second value would: 
be obtained like this 
Out of 60,000 persons number of blinds—40 


Out of 1,00,000 persons number of blinds gay X 1,00,000—67 
Jn a similar manner other values can be obtained. 
NEM ee ЛЕТА 
; Е Blind 
Age Mid-points (X— 35)10 persons (Y—185) 
T а, d рег ш dy dy? dedy 
0—10 5 —3 9 55 —130 16,900 390 
10—20 15 -2 4 67 —118 13,924 236 
20-30 25 -—1 1 100 —85 7,225 85 
30—40 35 0 0 111 —7A 5,476 0 
40—50 45 1 1 150 —35 1,225 —35 
50—60 55 2 4 200 +15 225 30. 
60—70 65 3 9 300 +115 13,225 345 
10—80 15 4 16 500 +315 99,225 1,260 
— 
Edy =4 2142=4 Edy=+3 Ed,’ Eddy 


2157425 =2,311 


= 
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заа, CRTA 


r= m 
d,)* ja 
ni zai- E N e 


N=8, Xd,dy—2,311, Id,—4, Id, —3, 24, 44, Edy? —1,57,425 
Substituting these values 
(4) ze 
EHE 2309-5 


Йй EA 157,425— 57425 97 “Vaz үуї,57,423-88 


Taking Logarithms 
Log r=log 2309°5—4(log 42+log 157423:88) 33636 —4(1:6232.4-5:1970] 
=3°3636— 4(6:8202)—3:3636—3* 410119535 
r—AL 1:9535=0 898. 


*Illustration 10. Given 

Total of the product of deviations of Х and Y ѕегіеѕз= 3044 
Number of pairs of observations=10 

Total of the deviations of X series — 170 

Total of the deviations of Y series=—20 

Total of the square of deviations of X series=8,288 

Total of the square of deviations of Y series=2,264. 


Find out the cocfficient of correlation when thearbitrary means of X series and 
Y series are 82 and 68 respectively. 


Solution. We are given: 
Ededy=3,044, УЧ = —170, Xd, —20 
Zd,” —8,288, Edy?=2,264, N —10. 


Applying the formula 
2 Ed, x Xd, 
- Rady EN 
р рар 
татти ыр 
(—170)(—20) 
3044—— 0 
m rey ~ (—20)? 
N 8288——15 ^ 2264——у— 
neue QNEM Lr EE 
у B288—2890 V 2264—40 — 4/5398 2244 


"Taking Logarithms у 
Log r=log 2704 — (108 5398--log 2244) —3:4320— 4(3:7322--3:3510) 

—34320—4(7:0832)—3:4320 —3:5416—1:8904 

r—AL 17904 + 0:777 
Another form of the formula when deviations are taken from assumed mean is 

pw Baty -NAT — Ay) 
Nozy 
Where Edd, =sum of the products of deviations from the assumed average. 
actual mean of the X series. 
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Y =actual mean of the Y series, 
A,z=assumed mean of the X series. 
Ay essumed mean of the Y series. 
The following example shall illustrate the application of the above formula : 
*Hlustration 11. Given: 
Number. of pairs of observations X and Y series=8 
X series Arithmetic Average—74'5 
X series Assumed Average ==69'0 
X series Standard Deviation=1307 
Y series Arithmetic Average=125°5 
Y series Assumed Average =112 
Y series Standard Deviation=15°85 
Summation of products of corresponding deviations of X and Y series=2,176 
Calculate the coefficient of correlation 
(Advanced Business Statistics, Rajasthan, 1969) 
_ 3d d, —NX—Ae(Y—Ay) 
NX Osy 
Zd,dy-sum of the products of deviations from the assumed average. 


Solution : r 


X = actual mean of the X series. 
Y —actual mean of the Y series. 
Ag=assumed mean of the X series. 
Ау>=аззитей mean of the Y series. 
We are given : 24„4„==2,176, Х=7& 5, As 7-69. 
Y—1255, Ay 112 and №8. 
Substituting the values in the formula 
2176—8(14:5— 691255112) 
n 8x 13°07 X 15°85 
2176—594 1582 $ 
= estate ӨТЕ" 109% 
Correlation of Grouped Data 
When the number of observations is large, the data are often 
classified into two-way frequency distribution called а correlation 
table*. The class intervals for Y are listed in the captions or 
colamn headings, and those for X are listed in the stubs at the left of the 
table (the order can also be reversed). The frequencies for each cell of the 
table are determined by either tallying or card sorting just as in the case of 
a frequency distribution of a single variable. 
The formula for calculating the coefficient of correlation is : 


зуда, ZAX Qa) 


Ni sae- | zy с 


* A correlation table is also called а bivariate frequency table since it shows the 
frequency distribution of two related variables. 


r= 
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Note. 'This formula is the same as the one discussed above for 
assumed mean. The only difference is that here the deviations are also 
multiplied by the frequencies. 


Steps. (i) Take the step deviations of variable X and denote these 3 
deviations by d,. * 


(ii) Take the step deviations of the variable Y and denote these 7 
deviations by dy. a 
(iii) Multiply ded, and the respective frequency of each cell and - 
write the figure obtained in the right-hand upper corner of each cell. j 


(iv) Add together all the cornered values as calculated in step (05) - 
and obtain the total Zfd,d,. 


(v) Multiply the frequencies of the variable X by the deviations of 
X and obtain the total Zfd,. 


(vi) Take the squares of the deviations of the variable X and 1 
multiply them by the respective frequencies and obtain Zfd,*. 


„ (vii) Multiply the frequencies of the variable Y by the deviations of | 
Y and obtain the total Zfd,. 


(viii) Take the squares of the deviations of the variable Y and multi- | 
ply them by the respective frequencies and obtain 20,2. 


(iz) Substitute the values of Zfd,d,, Zfd,, 24,2, Efd, and Ха, in © 
the above formula and obtain the value of r. ) 


Illustration 12. The following table gives the frequency, according to age | 
groups, of marks obtained by 67 students in an intelligence test. Measure the degree _ 
of relationship between age and intelligence test. 


Age in years 


Tests Marks 18 19 20 21 
200—250 4 4 2 1 
250—300 3 5 4 2 
300 —350 2 6 8 S 
350— 400 1 4 6 10 


"| 
(B. Com., Nagpur, 1972; М.А. Econ. Jabalpur, 1973) — 
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Solution : 
CALCULATION OF COEFFICIENT OF CORRELATION 


N 2 NO | 
NX Age in years | 
ы i 
EON 
Nidy А 
MET 
200—250 
250—300 
T1 
300—350 
350—400 |+2 —— 
Total 18 67 | =52 |=116 | =66 
| 7а, / 
ГГА 36 | =46 / 
— 9E A 
Zfd," 7 
Јал 72 |102 M 
EET 1,4, A 
74,4, 0 | о | 18 | 48 ram ge 
| y | 
Xd, yx (F. 
a4, — fet fdu) 


Р иии, 
5/@»\* d 
M зал ©К” / эйл— GI 


Xfd,dy =66, 5/4„= 46, 1fdy 52, 1f —102, Yd 1161N =67 


46x52 
66— 67 
ae = v3 
4 _ (46° 025 
a/ 102 CUm 116. 97 
66—357 303 


-ore Vile—404 17041756 
Taking Logarithms 
log relog 30°3—4(log 70°4-+log 75°6)=1°4814—} (1:8475--1*87 85) 
1481410729 - T4814— 1:863 1°61#% 
reAntilog 1761840415. П 
Jilustration 13. The following аге the marks obtained by the students of а class 
in Statistics and Accountancy : 


SME—10'77-21 
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Roll No. of Marksin Marks іп Roll No. of Marksin Marks in 
students Statistics Accountancy Students Statistics Accountancy 


H 15 13 13 14 п 
2 0 1 14 9 3 
3 2 15 8 5 
4 3 7 16 13 4 
5 16 8 17 10 10 
6 2 9 18 13 M 
7 18 12 19 п 14 
8 9 20 11 7 
9 4 17 21 12 18 
10 17 16 22 18 15 
1 6 6 23 15 
12 19 18 24 7 3 


Prepare a correlation table taking the magnitude of each class interval as 
four marks and the first class interval as equal to 0 and less than 4. Calculate Karl 
Pearson's coefficient of correlation. between the marks in Statistics and marks in 
Accountancy and comment on the correlation table. (B. Com., Bombay, 1973) 


Solution; PREPARATION OF CORRELATION TABLE 


Marks in Statistics 
Marks in 
Accountancy, 0—4 4—8 8—12 12—16 16—20 Total 
| pet] ый Аш EA 
| 
0—4 1@) га) 10) 4 
i | 2 - е0 CES 
4—8 10) К) 10) 1) 5 
| dr | 
s-m | im [im tia |] 1o. jim | 6 
12-16 |! 10 1a) 10) | 5 
16—20 10) 10) 10) 4. 
Total 4 4 | 6 5 5 24 


Let marks in Statistics be denoted by X and marks in Accountancy by Y. 


Se РеНрАГеЧретчрцречрешгецрчридре 
— "Tr—— 
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CALCULATION OF CORRELATION COEFFICIENT 


^4 
м 
© 


—414—8 | 8—1212—1616—20 
4 18 


4 | 8| 16 ta 
Zfdy | Efdy* |Efdady 
N=24| =0 | =42 | =25 


Ifd, 
ю | 25 УЛ 


эе ^ 
Dx / 
2 | s 7 


iud) 4 
Fady y 


Xfdd,— ш 


r= = = 
Уу)? 
л/ xd, — Oo) \/ Zfüyi— а 


"S/d,dy —25, Zfd,—3, Efd, 0, Xfd,? —45, xfd,?—42. N=24. 


UAUB—3BV32  VAVONXaL 
Taking Logarithms 


log r—log 25—}log 44625-Hog 42]—1:3979—3(1:6497-4-1:6232) 
—1:3979—1:63645— 177615 
r—Antilog 1°7615=0°578. 
jllustration 14. From the following bivariate frequency table calculate the value 
of correlation coefficient. 
(Husband's age (ys) T 


Wife's age (ys) 20—25 25—30 30—35 35—40 
15—20 20 10 3 2 
20—25 4 28 $ 4 
25—30 — 5 1 22 
30—35 - , = 2 ZH 
35—40 — E гт 5 


(В A. Gujarat, 1974) 
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Solution, Let husband’ 
Efdatly— 


CORRELATION ANALYSIS 


s age be denoted by X and wife's age by Y. 
fd, 


x Xfdy 
N — 


re 


Gf 
а. 
0 pa dm. 


CALCULATION OF CORRELATION COEFFICIENT 


Husband's age 


30—35 


35—40 | 


I AT — r Т Ж 
Total 24 | 43 | 22 | 11 | 100 |2—100 204 КАТ 
| npn e es 
XA HSS 2 ЖА MY 7 7 
—48 |—43 0 хуа, 
Ta | n | 245) jd 
rei o ow eked IET Ad 
p 96 | 43 | 0 | u |=150 36 
|} | — |_| — Е y 
xà, 
fdzdy ss | 48 | 0 | 2 И "d 
Ww 
Xfd,d,—138, Efdet=150, Efd,— —80, Efdy*=204, Efdy=—100, N=100 
_ (=80)(—100) 
1 ah Ee 
1 _ (80? _(—100* 
\[1% 10б \ 2% 005 
е Les EA AE 613 
МУ vio osi 70 


Assumptions of the Pearsonian Coefficient 


The Karl Pearson's coefficient of correlation is based on the fo 


assumptions : 

1. There is linear relationship be! 
{wo variables are plotted on a scatter 
formed by the points то plotted. 


tween the variables, 


lowing 


, б.е., when the 


diagram a straight line will be 
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2. The two variables under study are affected by a large number of 
independent causes so as to form a normal distribution. Variables like 
height, weight, price, demand, supply, etc., are affected by such forces that 
a normal distribution is formed. 

3. There is a cause and effect relationship between the forces 
affecting the distribution of the items in the two series. If such a relation- 
ship is not formed between the variables, i.e., if the variables are indepen- 
dent there cannot be any correlation. For example, there is no relationship 
between income and height because the forces that affect these variables 
are not common. 

Merits and Limitations of the Pearsonian Coefficient 


Amongst the mathematical methods used for measuring the degree of 
relationship, Karl Pearson's method is most popular. The correlation 
coefficient summarizes in one fizare not only the degree of correlation but 
also the direction, i.e., whether correlation is positive or negative. 

However, the utility of this coefficient depends in part on a wide 
knowledge of the meaning of this ‘yardstick’, together with its limitations. 
The chief limitations of the method are: 

1. The correlation coefficient always assumes linear relationship 
regardless of the fact whether that assumption is correct or not. 

2. Great care must be exercised in interpreting the value of this 
coefficient as very often the coefficient is misinterpreted. 

3. The value of the coefficient is unduly affected by the extreme 
items. b 

4. Ascompared with other methods this method takes more time 
to compute the value of corcelation coefficient. 

Interpreting the Coefficient of Correlation 

The coefficient of correlation measures the degree of relationship 
between two sets of figures. As the reliability of estimates depends upon 
the closeness of the relationship it is imperative that utmost care be taken 
while interpreting the value of coefficient of correlation, otherwise 
fallacious conclusions can be drawn. 

Unfortunately, the interpretation of the coefficient of correlation 
depends very much on experience. The full significance of r will only 
be grasped after working out a number of correlation problems and seeing 
the kinds of data that give rise to various values of r. The investigator 
must know his data thoroughly in order to avoid errors of interpretation 
and emphasis. He must be familiar, or become familiar, with all the 
relationships and theory which bear upon the data and should reach a 
conclusion based on logical reasoning and intelligent investigation on 
significantly related matters. However, the following general rules are 
given which would help in interpreting the value of r : 

1. When r=+1 it means there is perfect positive relationship 
between the variables. 

2. When r=—1 it means there is perfect negative relationship 
between the variables. 


3. Whenr=0 it means that there is no relationship between the 
variables, i.e., the variables are uncorrelated. 
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4. The closer r is to +1 or—1, the closer the relationship between 
the variables and the closer r is to 0, the less close the relationship. Beyond 
this it is not safe to go. The full interpretation of r depends upon circum- 
stances one of whichis the size of the sample. All that can really be 


said that when estimating the value of one variable from the value of 


another, the higher the value of r the better the estimate. 


5. The closeness of the relationship is not proportional to r. If 
the value of r is 0*8 it does not indicate a relationship twice as close as one 
ofQ'4, It is in fact very much closer. 


*Coefficient of Correlation and Probable Error 


The probable error of the coefficient of correlation helps in interpret- 
ing its value. With the help of probable error it is possible to determine 
the reliability of ihe value of the coefficient in so far as it depends on. 
the conditions of random sampling. ' he probable error of the coefficient 
of correlation is obtained as follows : 

1—7 
p.E,,*—0:6745 
А у 
where r is the coefficient of correlation and N the number of pairs of 
observation. 


„1 If the value of r is less than the probable error there is no 
evidence of correlation, i.e., the value of r is not at all significant. 


2. Ifthe value of r is more than six times the probable error 
the existence of correlation is practically certain, i.e. the value of ris 
significant. 

3. Ву adding and subtracting the value of probable error from 
the cocfficient of correlation we get respectively the upper and lower 
limits within which coefficient of correlation in the population сап be ex- 
pected to lie. Symbolically, 


p=r Р.Е. 
Where р (rho) denotes correlation in the population. 


Let us compute probable error, assuming a coefficient of correlation 
of 0'80 and a sample of 16 pairs of items. We will have 
p.E, 06745 1—08 —-06 
M16 
_ ^ Tke limits of the correlation in tke populetion would be r Р.Е., 
i.e., 084-0706 or 0°74—0°86. 


Instances are quite common herein a correlation coefficient of 0'5 or 
even 0°4 is obviously considered to be a fairly high degree of correlation by 
a writer or research worker. Yet a correlation coefficient of 0'5 means that 


* 1£0:6745 is omitted from the formula of probable error, we get the standard | 


error of the coefficient of correlation. The standard error cf r, *herefore, is 


SE- l 
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only 25 per cent of the variation is explained. А correlation coefficient. 
of 0'4 means that only 16 per cent of the variation is explained. 


*Conditions for the Use of Probable Errer 


The measure of probable error can be properly used only when 
the following three conditions exist :* 

1. The data must approximate a normal frequency curve (bell- 
shaped curve). 

2. The statistical measure for which the P.E. is computed must 
have been calculated from a sample. 

3. The sample must have been selected in an unbiased manner and 
the individual items must be independent. 

However, these conditions are generally not satisfied and as such 


the reliability of the correlation coefficient is determined largely on 
the basis of exterior tests of reasonableness which are often of a statistical 


character. 
` Illustration 15. If r= 0:6 and N=64, find out the probable error of the coeffi- 
cient of correlation and determine the limits for population r. 


Solution : 
P.E.,—0:6745 1-8 
QE. VN 
r=0°6 and N=64 
1—(06* _ 0645064 0-054 
ve 8 
Limits of population correlation =0°6+0°054 =0'546— 0:654 


Р.Е.,=0:6745 


Coefficient of Determination 


One very convenient and useful way of interpreting the value of 
coefficient of correlation between two variables is to use the square of 
coefficient of correlation, which is called coefficient of determination. The 
coefficient of determination thus equals 72. Ifthe value of r=0°9, r? will 
be 0'81 and this would mean that 81 per cent of the variation in the 
dependent variable has been explained by the independent variable. 
The maximum value of r? is unity because it is possible to explain all of 
the variation in Y, but it is not possible to explain more than all of it. 


The coefficient of determination (т?) is defined as the ratio of the 
explained variance to the total variance. 
] ne Explained variance 
Coefficient of determination— ХР 025 Vane 
Total variance 
The ratio of unexplained variance to total variance is frequently 
called the coefficient of non-determination. The coefficient of non-deter- 
mination is denoted by Æ? and its square root is called the coefficient o 


* Riggleman and Frisbee : Business Statistics, p.239. 
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alienation, or К. The К? and К values may also be used as the measure 
ofthe degree of relationship between two variables. For example, the 
higher the unexplained variance with respect to total variance, the 
higher will be the value of К? and the value of К. However, r? andr 
arè more convenient in interpreting the result of correlation analysis, 


Tt is much easier to understand the meaning of r? than т and, there- 
fore, the coefficient of determination is to be preferred in presenting the 
results of correlation analysis. Tuttle has beautifully pointed out that 
“the coefficient of correlation has been grossly overrated and is used 
entirely too much. Its square, the coefficient of determination is a much 
more useful measure of the linear covariation of two variables, The reader 
should develop the habit of squaring every correlation coefficient he finds 
cited or stated before coming to any conclusion about the extent of the 
linear relationship between the two correlated variables,” 


The relationship between r and r? may be noted—as the value of r 
decreases from its maximum value ofl, the value of 7? decreases much 
more rapidly. -r will of course always be larger than r?, unless r*—0 or 
4,0 when r=r?. 


r r? r ri 
Dolnlo c Л ы ci Ee ERS Te ола 
0:90 081 0:60 0:36 
0:80 0:64 0°50 0:25 
0:70 0:49 "40 0:16 


Thus the coefficient of correlation is 0'707 when just half the variance 
in Y is due to X. 


It should be clearly noted that the fact that a correlation between 
two variables has a value of r—0:60 and the correlation between two other 
variables has a value of r=0'30 does not demonstrate that the first corre- 
lation is twice as strong as the second. The relationship between the 
two given values ofr can better be understood by computing the value of 
7*. When r=0°6, r?=0°36 and when r=0°30, 72—0:9, 


.. he coefficient of determination is a highly useful measure. However, 
it is often misinterpreted. The term itself may be misleading in that it 
implies that the variable X stands in a determining or causal relationship 
to the variable Y. The statistical evidence itself never establishes the exis- 
tence ofsuch causality. АП that the statistical evidence can do is to define 
covariation, that term being used in a perfectly neutral sense. Whether 
causality is present or not and which way it runs ifitis present, must 


x determined on the basis of evidence other than the quantitative observ- 
ations. 


However, r? is always a positive number. It cannot tell whethér 
the relationship between the two variables is positive or negative. 
Thus the square root of ri, i.e.. Vrt=tr is frequently computed 
to indicate the direction of the relationship, in addition to indicating the 
degree of. on ae Since the range of r? is from 0 to 1, the coefficient 
of correlation r will vary within the range of 4/0 to 4/1, or from +1. 
The + (plus) sign of r will indicate positive correlation, whereas the — 
(minus) sign will mean a negative correlation. 


TAE Á E 
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*Properties of the Coefficient of Correlation 

The following are the important properties of the correlation coeffi- ` 
cient, r : 

1, The coefficient of correlation lies between—1 and--]. Symboli- 
cally —1&r&--lor |r| $1. 


Proof. Let x and у be the deviations of X and Y series from their mean and o, 
and o, be their standard deviations, Expand the function : 


x у \ х? у? 2xy E Ху, 25ху 
(x) ен] о, 
ух? M Ix 2 X Ух? 
But ге М [ SUM mer gee Benen | 
ec zy 
Similarl: =N 
imilarly a 
2Exy _ [ Д 3х7 ] 
Also DS =2r Sor Молу 


° 
Hence 3 ( x44 ) =N+N+2Nr = 2N2Nr-2N(-n. 
= v 


But > fae р is the sum of squares of real quantities and as such it 
cannot be negative—at the most it can be zero. 
2N(l+r) > 0 
Hence ғ caunot be less than —1 ; at the most it can be —1. 
Similarly by expanding X (2-2 ji it can be shown that this is equal to 
2N-—r). 
This again cannot be negative ; at the most it can be ze.o. 
r cannot be greater than --1 ; at the most it can be 4-1. 
Hence —i«&r&--1 
2. The coefficient of correlation is independent of change `of scale 
and origin of the variables X and Y. 


Proof. By change of orisin we mean subtracting some constant from every 
given value of Х and Y and by change of scale we mean dividing or multiplying every 
value of X and Y by some constant . 


.oxx-Xyy-T) _ 
МУХ Хуу Р)" 
Where Æ and Y refer to the actual means of X and Y series. 


We know that r= 


Let us now change the scale and origin. Deduct a fixed quantity a from X and 
b from Y. Also divide X and Y series by a fixed value í and c. Afte these changes 
are introduced, new values of x aud y obtained from original X and Y shall be 


= _ X- _Ү—5 
: xc and у= z 
z X—a ) EX—Na 
M f i Pas i — | 2X—Na 
CARO ET UN E анне о ЛЫН Ni 
ZX-Na _ Х— 
But WES 
Thus mean of x= = 
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Similarly it can be shown that mean of yo 2. 


The value of the coefficient of correlation, r, for new set of values will be 
Сар А ) Y-h 1-ь ) 
раа =: ( Ж EU 

ps i i c c 
X-a X-a Y-b 7-6 
(4-4) pe ке) 
x Х-а-Х+а X Y-b-F+b ) 
fe c s 


rar (еу Yir 


хх) Ў) 


с 
af S n xor? 
i^ с? 


a ®&-Хху-Їук (2 mxx-Xyuy-T) 
VIQ-Y?uy-Yye  Vzx-Xyxy-ry 


Thus the coefficient of correlations is independent of change of scale and origin. 


3. The coefficient of correlation is the geometric mean of two 
regression coefficients.* 


Symbolically 
r= bey X Oye 


IV. RANK CORRELATION COEFFICIENT 


The Karl Pearson’s method is based on the assumption that the 
population being studied is normally distributed. When it is known 
t the population is not normal, or when theshape of the distribution is 
not known, there is a need for a measure of correlation that involves no 
assumption about the parameters of the population. 


.  ltis possible to avoid making any assumptions about the popula- 
ions being studied by ranking the observations according to size and 
basing the calculations on the ranks rather than upon the original 
Observations, It does not matter which way the items are ranked, item 
number one may be the largest or it may be the smallest, Using 


ranks rather than actval observations gives the coefficient of rank 
correlation. 


This method of finding out covariability or the lack of it between 
two variables was developed by the British psychologist, Charles Edward 
Spearman in 1904. This measure is especially useful when quantitative 
measures for certain factors (such as in the evaluation of leadership ability 
or the judgment of female beauty) cannot be fixed, but the individuals in 


the group can be arranged in order thereby obtaining for each individual 


*For proof, please refer to next Chapter on ‘Regression’. 
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a number indicating his (her) rank in the group. Spearman's rank 
correlation coefficient is defined as : 

MU T E ER LI 

ee NOP-1) ЛЛ 
where r, denotes rank coefficient of correlation and D refers to the diffe- 
rence of ranks between paired items in two series. 


The value of this coefficient is interpreted in the same way 2$ Karl 
Pearson’s correlation coefficient and ranges between +1 and —1. When 
ris +1 there is complete agreement in the order of the ranks and the 
ranks are in the same direction. When rj is — 1 there is complete agree- 
ment in the order of the ranks and they are in opposite directiors. This 
shall be clear from the following : 

1 


Ry R D Ri Ка р 
RE р? ~(Ri— Ra) р? 
RES S55 8 es ee 
1 1 0 0 1 3 -2 4 
2 2 0 0 2 z 0 0 
3 3 0 0 3 1 2 4 
пар 1 оао зи стт mi сыы шз 
3p:—0 хр?=8 
ef) ОРАР ШЕРА таан dh аа... 
ED? 65, 
њ=1— 5207 Tels Woe 
6x0 _ GS Basan es 
ail Ol еї— 33—3 1-2=—1 


In rank correlation we may have two types of problems : 

A. Where ranks are given. 

B. Where ranks are not given. 
A. Where Ranks are Given 

Where actual ranks are given to us the steps required for computing 
rank correlation are : 


(i) Take the differences of the two ranks, ie, (2i— 
these differences by D. 
(ii) Square these differences and obtain the total 22°. 


(iii) Apply the formula 


R,) and denote 


65р? 
п=1— зу" 
Illustration 16. Two judges іп а beauty competition raak the 12 entries as- 
follows : 
x 17,20 o тр US as Gan MEUS 9-107 11 12 


Y 12:939 & 10. di VAT ру С AP IL 1 
What degree of agreement is there between the judges ? 
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Solution. CALCULATION OF RANK CORRELATION COEFFICIENT 


X Y Qu-Rj p: 
Ry 2 АПРЗ АЙЮ n 
1 12 ET 121 
2 3 -1 49 
3 6 -3 9 
4 10 ar. 36 
5 3 +2 4 
6 5 +1 1 
7 H +3 9 | 
8 7 il 1 
9 8 +1 1 
10 2 2 64 
1 11 0 0 
nied? 1 +11 121 
ES TUS SN du 2 i "зыш, “1 
ZD'—416 
бы са +... -= - эле 
n=l — КС Here $D'—416, N=12 | 
* 
Substituting the values ry=1 — 92316, 2996 4 i. 64. 454 


123—127 i716 


lilustration 17. Ten competitors in a beauty contest are. ranked by three judge | 
in the following order : 


Ist judge Же! рө 10.3 AE IE „лу ЭЖЕШ | 
2nd Judge C GS Pc REM TIPOS IO mE ГҮ $9 
3rd Judge Gy 4. 9 8 1 2. 3. БЕЗ Е 


Use the rank-correlation coefficient to determine which pair of judges has the 
nearest approach to common tastes in beauty, 


^s B. Com. Raj., 1973) 


Solution.. In order to find out which pair of judges has the nearest approach to 
common tastes in beauty we compare Rank Correlation between the judgments of : 
(i) Ist Judge and 2nd Judge. 


(ii) 2nd Judge and 3rd Judge, and 
(iii) 1st Judge and 3rd Judge. 


COMPUTATION OF RANK CORRELATION 


Rank by Rank by Rank by 
1st Judge 2nd Judge 3rd Judge (А) — (Re—Rs)? (ЕК)? 
R D? D? D: 


IRI 2 Rs 

1 3 6 4 9 25 

6 5 4 1 1 4 

s 8 9 9 1 16 
10 4 8 36 16 4 

3 7 1 16 36 4 

2 10 2 64 64 0 

4 2 3 4 1 1 

9 1 10 64 81 1 

7 6 5 1 1 4 { 
8 9 1 1 4 1 

N=10 N=10 N=10 — XD:—200  ZD:—214 Xp:—60 


ee 
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Rank correlation between the judgment of 1st and 2nd Judges : 
ry—1— бЕр" 

| м 
| Zp:—200, N=10 

Here we have directly calculated D* because D's are not required in applying the 
| formula 
6x 1200 
d and (ID 150200 1 71-r2n- 0:212 
Rank correlation between the judgment of 2nd and 3rd Judges : 

mo р. бЕр? 

(II and III) NS—-N 
6x214 


ED*—214, N=10=1 —195—107 1—1:297——0297 


Rank correlation between the judgment of Ist and 3rd Judges : 
6®р* 


rk 1 
(Тапа Ш) ~* у 
ZD*-60, N=10 
6x60 _, 360 o 
-1—19—10 ^! 990 0:636 


Since coefficient of correlation is maximum in the judgment of the first and 


third Judges we conclude that they have the nearest approach to common tastes in |. 


beauty. 
B. Where Ranks are not Given 


When we are given the actual data and not the ranks, it will be 
necessary to assign the ranks. Ranks can be assigned by taking either the 
highest value as 1 or the lowest value as l. But whether we start with the 
lowest value or the highest value we must follow the same method in case 
of both the variables. 

4 Illustration 18. Calculate the rank correlation coefficient for the following 
ata : 
2581-925. 1891/,87. 86, 8357 TT ATE LC 083 6253050 


Fi BG 083-5 0f 07]. 68:77:85 $2 ^82 0737. 17 SV. 
(B. Com. Bombay, 1972) 


Solution: CALCULATION OF RANK CORRELATION COEFFICIENT 


x Ry Y. Ro (Rı— R2)? 
р? 
92 10 86 9 1 
89 9 83 7 4 
87 8 91 - 10 4 
86 si 77 5 4 
83 6 68 4 4 
77 5 85 8 9 
71 4 52 2 4 
63 3 82 6 9 
53 2 37 1 1 
50 1 57 3 4 

ED3-—44 


62? . here ED*-44, N—10 


| 
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LI QT MUR 
real — e 971—597 1-0267=0'733 


‘Equal Ranks 


In some cases it may be found necessary tó rank two or more indivi- 
-duals or entries as equal. In such a case itis customary to give each 
individual an average rank. Thus if two individuals are ranked equal at 


fifth place, they are each given the rank 5+6 , that is 5'5 while if three 
Б 1 
аге ranked equal at fifth place they are given the rank 536476 In 


other words, where two or more items are to be ranked equal, the rank 
assigned for purposes of calculating coefficient of correlation is the average 
ofthe ranks which these individuals would have got had they [differed 
slightly from each other. 

Where equal ranks are assigned to some entries an adjustment in the 
-above formula for calculating the rank coefficient of correlation is made. 


The adjustment consists of adding E (m?— m) to the value of Z D*, where 


m stands for the number of items whose ranks are common. If there are 
"more than one such group of items with common rank, this value is added 
as many times the number of such groups. The formula can thus be 
written : 


esie N'N 


Illustration 19. From the following data of the marks obtained by 8 students 
£n the Accountancy and Statistics papers compute Rank Coefficient of Correlation. 
Marks in Accountancy 15. 20 28 .:2 40 60 20 80 


Marks in Statistics 40 30 530 39 20 10 з 60 
Solution : COMPUTATION OF RANK CORRELATION 


Marksin Rank assigned Marksin Rank assigned (Еа) 


Accountancy Statistics 
x Ri 1g Ra D р? 
15 2 40 6 -4 16:00 
20 35 30 4 —'5 0:25 
28 5 50 7 —2 4:00 
12 1 30 4 —3 9°00 
40 6 20 2 +4 16°00 
60 7 10 1 +6 36°00 
20 35 30 4 —5 0°25 
80 8 60 8 +0 0:00 
2р%=$1'5 


bat оа V Ru. 2C С 
6 { зр (mm) o (nim) 
ву 4 — ; here ED*-81:5, N=8 


The item 20 is repeated 2 times in series Y. So m—2. In series Y the item 30 
2ccurs 3 times and so m—3. 
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Substituting these values in the above formula 


6l se @-2+ go] 
т=1— 835—8 
21981540542) , 6x84 504 


jo А. 


504 504 — 
Merits ава Limitations of the Rank Method 


Merits. 1. This method is simpler to understand and easier to 
apply compared to the Karl Pearson's method. 'The answer obtained by 
this method and the Karl Pearson's method will be the same provided no 
value is repeated, i.e., all the items are different. 


2. Where the data are of a qualitative nature like honesty, effici- 
ency, intelligence, etc., this method can be used with great advantage. 
For example, the workers of two factories can be ranked in order of 
efficiency and the degree of correlation established by applying this 
method. 

3. This is the only method that can be used where we are given the 
ranks and not the actual data. 

4. Even where actual data are given, rank method can be applied 
for ascertaining rough degree of correlation: 

Limitations. 1. This method cannot be used for finding out correla- | 
tion in a grouped frequency distribution. J 

2. Where the number of items exceed 30 the calculations become 
quite tedious and require а lot of time. Theresore, this method should not 
be applied where N exceeds 30 unless we are given the ranks and not the 
actual values of the variable. 

When to use Rank Correlation Coefficient ? 


The rank method has two principal uses : 
(1) The initial data are in the form of ranks. 


(2) If N is fairly small (say, nó longer than 25 or 30) rank method is 
sometimes applied to interval data as an approximation to the more time- 
consuming r. This requires that the interval data be transferred to rank 
orders for both variables. If N is much in excess of 30, the labour required 
in ranking the scores becomes greater than is justified by the anticipated 
saving of time through the rank formula. 


V. CONCURRENT DEVIATION METHOD 


This method of studying correlation is the simplest of all the 
methods. The only thing that is required under this method is to find out 
the direction of change of X variable and Y variable. The formula appli- 


cable is : 


n CU ) 


where т, stands for coefficient of correlation by the concurrent deviation 
method ; С stands for the number of concurrent deviations or the number 
of positive signs obtained after multiplying Dz with Dy. 

n=N—1 where N =total number of items. 
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Steps. (i) Find out the direction of change of X variable, i.e., as 
compared with the first value, whether the second value is increasing or 
decreasing or is constant. Ifit is increasing put a + sign ; if it is decreas- 
ing put a — sign (minus sign) and ifit is constant put zero. Similarly, 
compared to second value find out whether the third value is increasing, 
decreasing or constant. Repeat the same process for other values. Denote- 
this column by D,. 


(ti) In the same manner as discussed above find out the direction of 
change of Y variable and denote this column by Dy. 
(iti) Multiply Dz with Р, [апа determine the value ofc, i.e., the 
number of positive sign. 
(iv) Apply the above formula, i.e., 


ад, J (er) 
Note. The significance of + signs, both [inside the underroot and 
outside the underroot, is that we cannot take the underroot of minus sign. 


[a 


Therefore, if Dong negative, this negative. value multiplied with the 


minus sign inside would make it positive and we can take the underroot. 
But the ultimate result would be negative. If pad 


is positive then, of 
course, we get a positive value of the coefficient of correlation. 

» Illustration 20, Calculate the coefficient of concurrent deviations from the data 
iven below: 


Year 1963 1964 1965 1966 1967 1968 1969 1970 1971 
Supply 160 164 172 182 166 170 178 192 186 
Price 292 280 260 234 266 254 230 190 200 
(B. Com., ‘Madras, 1972) 
Solution : 


CALCULATION OF COEFFICIENT OF CONCURRENT DEVIATIONS 


Gu Ue M LR == ыланы a A БЫ Бл a NN 
Supply Direction of Price Direction of 


Year change Y change DzDy 
, = : Dy 
1963 160 292 
1964 164 3 280 = le 
1965 172 + 260 — — 
1966 182 T 234 — -- 
1967 166 = 266 E = 

` 1968 170 T 254 - — 
1969 178 + 230 RET 22 
1970 192 + 190 — — 
1971 186 — 200 Ф es 

С=0 
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Illustration 21. Find out coefficient of correlation by Concurrent Deviation 
method from the following data : 
x 100 120 135 135 115 110 120 
„ < 50 40 60 80 80 55 65 
© Solution: CALCULATION OF CORRELATION BY CONCURRENT 
DEVIATION METHOD 


Direction of Direction of 
change of X change of Y 
variable variable 
X Da Y Dy Р.Ру 
100 50 
120 + 40 — iq 
135 + 60 os + 
135 0 80 + 0 
115 — 80 0 0 
110 — 55 — + 
120 + 65 + + 
С=3 


nata) "I Here C=3, n=6 


ay PSE уб 
rata (2S7 ) -J3 =0 
Merits and Limitations of Concurrent Deviation Method 


Merits. 1. It is the simplest of all the methods. 


9. When the number of items is very large, this method may be used 
tc form a quick idea about the degree of relationship before making use of 
more complicated methods. 

Limitations. 1. This method does not differentiate between small 
and big changes. For example, if X increases from 100 to 101 the sign 
will be plus and if Y increases from 60 to 160 the sign will be plus. Thus 
both get equal weight when they vary in the same direction. 

2. The results obtained by this method are only a rough indicator 
of the presence or absence of correlation. 

Calculation of Correlation in Time Series 

When we observe numerical data in relation to time the set of obser- 
vations so obtained is known as time series. Time series depict two types 
of fluctuations : (i) long-term, and (ii) short-term. While studying correl- 
ation between two time series; it is necessary to study separately the 
correlation of long-term changes and short-term changes. The reason is 
that the relationship between the long-term changes of two series may be 
quite different from that between the short-term changes of these series. 
It is quite likely that in two time series there may be negative correlation 
between long-term changes and positive correlation between short-term 
changes or vice versa. Hence it becomes necessary to study separately 
correlation between long-term changes and short-term changes as other- 
wise misleading results may be obtained. 


*When we attempt to correlate the values of a variable X at certain times with 
corresponding values for X at earlier times, such correlation is often called auto-correl- 


5 
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A. Correlation of Long.term Changes 


For finding out the correlation of long-term changes, the only thing 
required is to determine trend values for both the series by the moving 
average method or the method ofleast squares. After determining these 
trend values correlation can be obtained by the methods discussed above 
and no special method is necessary. The only thing to remember is that 
the coefficient of correlation shall be computed ofthe trend values of the 
two series and not of the original data. 


B. Calculation of Correlation in Shortterm Changes or 
Oscillations 
Steps. (i) Determine the trend values by the moving averagemethod. 
(ii) Deduct from the actual values the corresponding trend values 


obtained in step (i). This would give the short-term fluctuations, Denote 
these short-term fluctuations by the symbol z for X series and y for Y series. 


(iii) Square the short-term fluctuations for X series and obtain the 
total 22, 

(iv) Square the short-term fluctuations of Y series and obtain the 
total Ly’, 

(0) Multiply z with y for each value and obtain the total 22у, 
(vi) Now apply the formula 
ers Zzy 
Ууу 

Неге, z denotes deviation of X series from moving average and not from 


arithmetic mean, Similarly, y denotes deviation of Y series from moving 
average and not from arithmetic mean. 

Thus we find that the only difference between correlation explained 
earlier and correlation in short-term changes is that whereas in the former 
we take deviations from arithmetic mean, in the latter we take deviations 
from the trend values. 

Note. Itshould be carefully noted that z and y in the above for- 
mula are different from the х and y of the Pearsonian formula. 

» Tilustration 21. Calculate the Karl Pearson's coefficient of correlation of the 


short-term oscillations for the indices of supply and price of a certain commodity 
given here : 


Year Index of Supply Index of Price Year Index of Supply Index of Price 


1960 91 17 1967 107 68 
1961 98 97 1968 104 77 
1962 - 95 102 1969 98 93 
1963 92 108 1970 100 89 
1964 93 105 197i 108 93 
1965 9é 96 1972 116 78 
1966 102 77 1973 114 84 

1974 111 95 


‘Take 5-yearly moving average and ignore dezimals in computing the average.] 

^ 

* Уху $ 
= „зала ёж а 
У Ух? хуу? 4 
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Special care must be taken while interpreting а significant correla- 
tion in time series as itis very likely that the correlation is the result of 
other variables which influence both the correlated variables. A high 
degree of correlation may be observed between any two economic time 
series and this may be due to the remarkable and fairly steady growth of 
theeconomy.* ‘Yule states that the correlation between radio-receiving 
licences granted in the United Kingdom and number of notified mental 
defectives per 10,000 population from 1924 to 1937, inclusive, is very high 
(r=0'998). This certainly seems to the person who attempts to explain it 
rationally as cause and effect, to be indeed a ‘nonsense correlation”, as 
Yule calls such cases. Karl Pearson called such cases “Spurious Correla- 
tion’.j Yule has nicely explained that the two series were increasing 
through time, probably from different underlying causes, although the 
growth of technology including medicine and mental health diagnosis and 
treatment may account for the growth of both series. Yule makes the 
very valid point that any two time series which are either rising or falling 
steadily (one may be rising, the other falling) will show negative or posi- 
tive correlation, no matter what may be the reasons for the variation in 
the two series. This clearly shows that correlation means little except that 
the two series are varying in unison. If the causes are different for the 
two series, the correlation may not continue in future, 


In some cases the two correlated time series may not only be acted 
upon by other variables, but also one or both of them may influence the 
other. For example, one may find a high correlation between industrial 
profits and the wages of the workers most of the time, A logical argument 
can be made that each has some influence on the other but both arc 
greatly influenced by the underlying variable which cause growth and 
prosperity (or recession in the economy). 


Lag and Lead in Correlation 


The study of lag and lead is of special significance while studying 
economic and business series. In the correlation of time series the inves. 
tigator may find that there is a time gap before a cause and effect relation- 
ship is established. For example, the supply of a commodity may 
increase today, but it may not have an immediate effect on prices— it 
take a few days or even months for prices to adjust to the increased su 
‘This difference in the period before acause and effect relationship is 
established is called ‘Lag’. While computing correlation this time gap 
must be considered ; otherwise fallacious conclusi 


ust ; ў ‹ ions may be drawn. The: 
pairing of items is adjusted according to the time lag. 


p 


may 


pply. 


*Tuttle, A.M.: Elementary Business and Economic Statistics, p. 448 


М }Snedecor objects to Pearson's term “Spurious Correlation” 
that it is the interpretation which may be erroneously given which і 
the correlation. If two variables vary together, they are correlated. However, one 
should use considerable care and judgment in deciding which correlated variables should. 
be studied. and in interpeting the correlations which are discovered. 


on the ground 
s Spurious, not 
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If the supply affects the price, say, 


would be done as follows : 


Month 


Jan. 
Feb. 
March 
April 
May 
June 
July 
Aug. 
Sept. 
Oct. 
Nov. 
Dec. 


Taking the new pairs of values of correlation can bs 


manner as discussed earlier. 


Illustration 23. The following are 
{tis generaily found that advertisiag 


ture and 52125 of a firm. 


E-10°40 


after 3 months, then the pairing 


Supply Price 
100. 70 
105 69 
1065 | 80 
112 MR 
118 ———————415 
120 70 
125 74 
104, 76 
122 78 
116 80 
122 MEER e 
127 5 


the monthly fig i-es of 


impact on sales generally after 2 months. Allowing foc this tims 


cient of correlation. 

Month Advertising 

Expenditure 
Rs. 

Jan. 50 

Feb. 60 

March 70 

April 90 

May 120 

June 150 

Solution. 


Allow for a time lag of 2 months, i.e., 


calculated in the same 


advertising expendi- 
expenditure has its 
lag calculate coeffi- 


Sales Month Advertising Sales 
Rs. Expenditurc s. 
Rs. 
1,200 July 140 2,400 
1,500 Aug. 160 2,600 
1,600 Sept. 170 2,800 
2,000 Oct. 190 2,900 
2,200 Nov. 200 3,100 
2,500 Dec. 250 3,900 


of January with sales for March, and so on. 
CALCULATION OF CORRELATION COEFFICIENT 


link advertising expenditure 


Month Advertising х-® Sales ( 
Expenditure 10 : 
X Ж x? r xy 
Jan. 50 —7 49 1,600 —10 70 
Feb. 60 —6 36 2,000 36 
March 70 —5 25 2,200 20 
April 90 -3 9 2,500 3 
May 120 0 0 2,400 0 
June 150 +3 9 2,600 0 
July 140 42 4 2,800 4 
Aug. 160 +4 16 ~ 2,900 12 
Sept. 170 +5 25 3,100 25 
Oct. 190 +7 49 3,900 91 
BX=1,200 Ex=0 Ex? БҮ Exy 
—222 =26,000 =261 
r= d 
м XXX 3s 
ga 12% Lao, 7 — 2500 —2600 
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®ху=261, 2x1—222, Ey*— 364 
261 Dope A St 
ræ Уха HAT +0°918 
MISCELLANEOUS ILLUSTRATIOS 
Illustration 24. Find the correlation coefficient between age and playing habit 
of the following students : 


Age 15 16 17 18 19 20 
No. of students 250 200 150 120 100 80 
Regular players 200 150 90 48 30 12 


(B. Com., Rajasthan 1974) 


Solution, Let us first find the percentage of regular players and then calculate 
correlation between age and percentage so obtained, 


CALCULATION OF CORRELATION COEFFICIENT 


Age No.of Regular Percentage of 
X (ХХ) Students players — regular players (Y — T) 

x xt у » y xy 
15 25215 625 250 200 80 +30 900 —75:00 
16  —15 2:25 200 150 75 +25 625 —37°50 
17 eel 0:25 150 90 60 +10 100 —500 
18 amb (25 120 48 40 —10 100 —5'00 
19 +155 225 100 30 30 —20 400 —30'00 
20 +25 6°25 80 12 15 —35 1225 —87:50 

ІХ=105 Xx—0 Ухуз]75 ZY-300 Уу=0 zy Exy 

=3350 240 

Уху 


"У Bx х Ey? 
x-0-3); yeg-P; FMB as; Fa 00-3050 
50 Уху=—240, Ex2=17'5, 2y*—3350 
cred ven 
V/IT.5x 3350 
Taking Logarithm, log r=log 240—1 (log 17°5+-log 3350) 
=2°3802—4(1'2430-+ 3°5250)=2°3802—}(4°7680) 
72:3802—23840— 1:9962 
TAL 1:9962— —0*991 


.. There is thus an inverse correlation between age and playing habit, i.e., as the 
age increases the tendency to play decreases, А 


Illustration 25. Calculate Karl Pearson’s ccefficient of correlation from the 
data given below : 


Age in years 
AT 18 19 20 21 22 
20—25 3 2 X мг е; 
15—20 EA 5 4 an = 
10—15 zT = ЗОО ES 
5—10 — E = 3 2 
0—5 == = — 3 1 


(B. Com., Delhi, 1974) 
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Solution, Let age be denoted by X and marks by Y. 


Xfdxx E 
зухду 27 х! dy 


Ni NC | zy oer 


r= 


CALCULATION OF COFFFICIENT OF CORRELATION 


N x r A | 
N Age in years 
38 np njaj] Жу | fdy* | fdxdy 
Y E dx|-2 |-1. pi RE ИШ ШЕ] 
xi MP. dy —12 ]-4 En. 
20—25 ,22542]|37 | 20 
4 
1-20  |i7541 — у D 
ЖЕШ ЛГ Шера 0 
10-15 |1725 0| — ^ 
5-10  |T5-1|— у pod 
ENTER iori sp ЛЕП 
0-5 z5-2|— | lp 
Total 3 L/dy* | Efdxdy 
EA |=50 24-38 
уде | 7° 
ERE 
ахау |12 
зуахау= —38, Efdx? 41, Bfdx=9, 3/4y:—50, 2/dy 6, N=40 
na pes — 
y 41— А 
оу. уны: l3 
38—135 3935 39:35 ogg] 


8—135 =. —Ó TL TTC 
51025 4/5009 V A4 915/491 46:99 


lilustration28. A department store gives in service training to its salesmen 


which is followed by а test. It is considering whether it should terminate the services of 
any salesman who does not do well in the test. The following data give the test 


scores and sales made by nine salesmen during a certain period : 
Test scores 14 19 apo a0 99722. . 15 20:709 
Sale(00 Rs) 31 36 48 131 ..50 745. 3 4 39 


Calculate the correlation coefficeint between the test scores and sales. ‘Does it 
indicate that the termination of services of low test scores is justified ? 
(C.A. 1974) 


pU 
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CALCULATION OF CORRELATION COEFFICENT 


Test Scores (X—X) Sales (ҮҮ) x1 
24 X Еа Y РА 92 xy 
OE 
14 =6 36 31, —9 81 +54 
19 -1 T 36 —4 16 +4 
24 +4 16 48 +8 64 +32 
21 41 1 37 =з 9 —3 
26 +6 36 50 +10 100 +60 
2 22 +2 4 45 +5 25 +10 ў 
15 -5 25 33 E 49 435 
20 0 0 41 ES! 1 0 
19 -1 1 39 ET 1 +1 
®Х=10  Zx—0 д—120 EY=360 Iy-0  Ey?=346  Xxy-195 
MT Уху 
= Ex? x Dy? 
УХ 180 * 
х0) yeq- Y; Yo D 39; 
XY 360 
-у-— 9 9 
Exy=193, 2х?=120, Ху2=346 
aic hee 
У 120х346 


Taking logarithems ; 
Logrelog 193—1- (log 120--log 346) 


-22856— 7. (2:0792-+2'5391) «2:2856— 7-4 6183) 


72:2856—2:3091—1.9765 
ге AL 1:9765—0:947 


There is a high degree of positive correlation between test scores and sales. 
i ae be concluded from this that the termination of servies of low test scores is 
jus E 


tion 29 (a) Coefficient of correlation between two variates X and Y is 


Tilustra! 
Gee meir covariance іѕ 9. The variance of X is 16, Find the standard deviation of 
series. 


(5) The coefficient of rank correlation between debenture prices and share prices 


is found to be 0°143. If the sum of the squares of the differences in ranks is given to 
be 48, find the the value of N. 


Solution. (а) Covariance means m › Where x and y are the deviations of the 
values of X and Y series from their respective means. 
Variance of X series is 16, i.e., 


a— 16-4 
pU Блу DIE ANCA ^ 


Ме, — N “azo, 
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Substituting the given values, we get 
03=9% 
120-9 
ym i75 
© runt 620; Here rm0143, 107-48 


{ 6x48 
0143-1— RSN 


288 
Spem 


0:857 (N3—N)-288 
288 
(N3—N)e ze =336 


0:857 
or N3—N—336-0N3—N—3434-7 0 
(N—7)N2-+-7N(N—7) +48 (N —7) 0 
or (N—7)(N2-+7N+48)=0 
either N—7=0 ie. N=7 
or N?+7N+48=0 


Since b®—4aC is negative, values of N belong to the set of complex numbers. 


Hence №7. 

Illustration 30. The coefficient of rank correlation of the marks obtained by 10 
students in statistics and accountancy was found to b208. It was later discovered 
that the difference in ranks in the two subjects obtained by one of the students was 


wrongly taken as 7 instead of 9, Find the correct cosffizient of rank correlation. 
(B. Com., Bombay, 1972) 


Solution : 
T] 
nel AR: Here ry08, N10 
6ED2 бхр? 
98-1— 195—1971— 79907 
6£D? 
. n d 


6ED%—=198  .. ID'—33 


But this is not correct ХР? 
Correct ®р?=33— (1)°+ (9) -33—49--81 «65 
n-l- 31pm 1-034 = 0'606 

Thus the correet value of the rank correlation coefficient in this case is 0606. 

Illustration 30. А computer while calculating the correlation coefficient between 
two variables X and Y obtained the following eonstants : 

N30, EX 120, ZX*-—600, БҮ=90, 22-250, 1XY«356 

It was however later discovered at the time of checking that it had copied down 

two pairs of observations as : 


x Ir 
м з | 10 
122107 
ИШ ae 3 
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while the corrected values were 
x ре 


8 12 
10 8 


in t t value of the correlation coefficient between X and Y. 
Qn aren ШЕ (М.А. Econ. Punjab, 1973) 


Solution : 

Correct ZX—120—8—12-4 8--10—118 

Correct EY=90—10—7+12+8=93 

Correct 5Х?=600— (8)%1— 12)? (8)*-- (10)? =600—64—144+.64+ 100=556. , 
Correct 22—250—(10)— (7-- (12) (8) —250 —100— 49 +144 -+-64=309 


Correct  EX¥=356—(8X 10)— (12x 7)-- (8x 12)-- (10 X 8) 
=356—80—84-+ 96--80—368 


ar - XY 
hide lul шы cete 
EX2 у n ER 
| у “©” Д-у Ч* 
EXY=368 x= ea; rt =3; ExX2=556; BY2=309; N—30 
38 (4x3) 
pike 
Саг T ep 
J 256 { T — (3) 
oa Savor д1 12га. ш 
=y d853-16103—9 
0:27 0:27 


утат гиз 7019 
Illustration 32 A correlation coecient of 05 does not. mean that 50% of the 
data are explained, Comment. (B.A., Hons. Econ., Delhi, 1970) 


Solution. Thecoefficient of correlation does not indicate the percentage of 
data that is explained. For correctly interpreting the correlation coefficient we have to 
find the value of coefficient of determination. The coefficient of determination is 
found by squaring r and it gives an idea as to what proportion of variation of Y is 
explained by the variation in X. Thus (0:5)? ог 25% of the variation of Y are explain- 
ed by the linear regression fitted to the data. Even this does not mean that 25% of 
the data are explained. И only means that out of total variation of Y only 25% is due 
to X and the rest is due to other factors. uà 

Illustration Б Calculate the numberof items for which r=+0'8, ®ху==200 
standard deviation of Y=5 and Zx*—100, where x and y denote deviation of items 
from actual means. 


Solution. 


Gyan R 
y 4 E 
зу 

5= Squaring 
N 


2 
25= БУ ог Еу?=25№ 
pea а ва: 
M ®х* х Ey? 
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Substituting the given values 

200 ‚_ 200)? 

СИС БЕ “64a 

уто E 2500 N 

*64x 2500 № =40000 

1600 N=40000 
40000 
N= -T600 


Illustration 33. It covarlance between X and Y variables'i? 10 and the variance 
of X and Y are respectivly 15 and 9, find the coefficient of correlation. ' 


Solution. 
Covariance of X and ya? =10 


=25 


Variance of 'X\=16 .'. ox=4/16=4 
Variance of Y=9 .". oy=4/9=3 


BOR 10 7 
Substituting the values r—7773-— +0833 


Hlustration. 34. Coefficient of correlation between X and Y for,20 items is 0:3; 
mean of X is 15 and that of Y 20, standard deviations are 4and 3 respectively. At 
the time of calculations one item 27 has wrongly taken as 17 in cases of X series and 
35 instead 30 in cases of y series. Find the correct coefficient of correlation, 


Solntion. 
3-3. or sx N X ; Here N=20, Rais 


oS EX=20x 15=300 
Correct EX¥=300 - 174-27 310 
310 У 
Correct X = 7155 


rot orZYeN T 


N=20, Y=20 
XYe20x20-400 
Correct EY-—400—35-- 30-7395 


Corect T=- =1995 


Similarly find correct standard deviations also 


s,- | £C (у, 
y P oge 


— 
4= = — (15) squaring 


=x? 
16= 725 


320— 1X? —4500 
3x1—4500-4-320-— 4820 


Correct ZX294820— (17)! -- (27) 5260 
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Correct o= J a —(Correct X)* 
52 2 
=, | шүү 


=y 263—240 25—4 77 


ey žy- (Ӯ)? 
‚= ZY> _ (y 
s 2 aor - 09 
Squaring 
хү? 
25= 07 —400 
EY? —500+8000=8500 
Correct EY*—=8500—(35)*+-(30)?=8175 


‘Correct oy= "I AES --(1975*. — 4087733900543. 


Calculate Zxy with uncorrected figures. 


jum Уху 
Nozsy 


‘Substituting the values we get 
з Exy x^ an 
3505455 ог Хху=400 х "3=120 
‘Correct Exy- Incorrect Zxy—(wrong item of X—Mean of X) 
(wrong item of Y -Mean of Y)4- 
(Correct item of X—Correct Mean of X ) (correct item of Y—correct mean of Y 
—120—(17—15) (35—20)4-(27 — 15:5) (30—19°75) 


=120—(2х 15)-- (15x 10:25) 120 —30-- 117875 
=207°875 


ху _ 207875 20TEIS _ Mcr 
бас, 20x477X3333 410414 


l1 Regression Analysis 


LM MÀ a: 


After having established the fact that two variables are closely related 
we may be interested in estimating (predicting) the value of one variable: 
given the value of another. For example, if we know that advertising and 
sales are correlated we may find out the expected amount of sales for a 
given advertising expenditure or the required amount of expenditure for 
attaining a given amount of sales. Similarly, if we know that the yield of 
rice and rainfall are closely related we may find out the amount of rain 
required to achieve a certain production figure. 4 he statistical tool with 
the help of which we are in a position to estimate (or predict) the unknown 
values of one variable from known values of another variable is called 
regression. Regression thus reveals average relationship between two 


variables and this makes possible estimation or prediction. 


The dictionary meaning of the term ‘regression’ is the act of return- 
ing or going back. The term ‘regression’ was first used by Sir Francis 
Galton in 1877 while studying the relationship between the height of 
fathers and sons. This term was introduced by him in the paper ‘Regres- 
sion towards Mediocrity in Hereditary -Stature’. His study of height of 
about one thousand fathers and sons revealed a very interesting relation- 
ship, i e., tall fathers tend to have tallsons and short fathers short sons, 
but the average height of the sons of a group of tall fathers is less than that 
of the fathers and the average height of the sons of a group of short fathers 
is greater than that of the fathers. The line describing this tendency to 
regress or going back was called by Galton a ‘Regression Line’. The term 
is still used to describe that line drawn for a group of points to represent 
the trend present, but it no longer necessarily carries the original impli- 
cation of "stepping back” that Galton intended. These days there is a 
growing tendency of the modern writers to use the term estimating line 
instead of regression line because the expression estimating line is more 
clarificatory in character. 


Regression analysis is a branch of statistical theory that is widely 
used in almost all the scientific disciplines. In economics it is the basic 
technique for measuring or estimating the relationship among economic 
variables that constitute the essence of economic theory and economic life. 
For example, if we know that two variables, price (X) and demand (Y) 
are closely related we can find out the most probable value of X for a 
given value of Y or the most probable value ofY for a given value of X. 
Similarly, if we know that the amount of tax and the rise in the price of a 
commodity are closely related, we can find out the expected price for a 
certain amount of tax levy. Thus we find that the study of regression is 
of considerable help to the economists and businessmen. 

The uses of regression are not confined to the economics and business 
fields only. Its applications are extended to almost all the natural, 
physical and social sciences. 

The tool of regression can be extended to three or more variables. 
But in this text we shall confine ourselves to the problems of two variables 
only, i.e., simple regression. 
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Difference between Correlation and Regression Analysis 


Following are the points of difference between correlation and regres- 
sion : 

1. Whereas correlation coefficient is a measure of degree of covari- 
ability between X and Y, the objective of regression analysis is to study the 
‘nature of relationship’ between the variables so that we may be able to 
predict the value of one on the basis of another. The closer the relation- 
ship between two variables, the greater the confidence that may be placed 
inthe estimates. Conventionally, the variable which is the basis of pre- 
diction is called independent variable and the variable that is to be 
predicted is referred to as the dependent variable. 


2. The cause and effect relation is clearly indicated through regres- 
sion analysis than by correlation. Correlation is merely a t-ol of ascertain- 
ing the degree of relationship between two variables and, therefore, we 
cannot say that one variable is the cause and the other the effect. For 
example, a high degree of correlation between price and demand for a 
certain commodity or a particular point of time may not suggest which is 
the cause and which is the effect. However, in regression analysis cause 
and effect relationship is very clearly indicated—one variable is taken as 
dependent and the other the independent. 


The variable whose value is influenced is called the dependent vari- 
able and is denoted by Y ; the variable which exerts the influence is called 
the independent variable and is denoted by X. The value of the indepen- 
dent variable ‘explains’ the value of the dependent variable. For this 
reason the independent variable is often called the explanatory variable. 
If we are interested in finding out the relationship between the level of 
income and consumption, the level of income will influence the level of 
consumption and as such income is the independent variable and con- 
sumption the dependent variable. Traditionally the dependent variable is 
taken as Y variable and the independent variable as the X variable. 


LINEAR REGRESSION 

Variables may have either linear or non-linear relationship. Two 
variables are said to have linear relationship when change in the indepen- 
dent variable (say X) by one unit leads to constant absolute change in 
the dependent variable (Y). When two variables have linear relationship 
the regression lines can be used to find out’ the values of dependent vari- 
able. When we plot two variables (say X and Y) on a scatter diagram 
and draw two lines of best fit which pass through the plotted points, these 
lines are called regression lines. In linear regression, these lines are straight 
ones, These regression lines are based on two equations called regression 
equations which give best estimate of one variable when the other is exactly 
known or given. 


Regression Lines 


If we take the case of two variables X and Y, we shall have two 
regression lines as the regression of X on Y and the regression of Y on X. 
The regression line of Y on X gives the most probable values of Y for given 
values of X and the regression line of X on Y gives the most probable 
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values of X for given values of Y. However, when there is either perfect 
positive or perfect negative correlation between the two variables (r— 4-1) . 
the two regression lines will coincide, ї.е., we will have only one line. The 
farther the two regression lines from each other, the lesser is the degree of 
correlation and the nearer the two regression lines to each other, the higher 
is the degree of correlation. If the variables are independent, r is 
zero Li the lines of regression are at right angles, i.e., parallel to OX 
an : 


Itshould be noted that the regression lines cut each other at the 
point of average of X and Y, i.e., if from the point where both the regres- 
sion lines cut each other a perpendicular is drawn on the X-axis, we will 
get the mean value of X and if from that point a horizontal line is drawn 
on the Y-axis, we will get the mean value of Y. 


Itisimportantto notethatthe regression lines are drawn on least 
squares assumption which stimulates that the sum of squares ofthe devia- 
tions of the observed ‘Y’ values from the fitted line shall be minimum. 
The total of the squares of the deviations of the various points is minimum 
only from the line of best fit. The deviations from the points to the line 
of best fit сап be measured in two ways— vertical, i.e., parallel to Y-axis 
and horizontal, i.e., parallel to X-axis. For minimising the total ofthe 
squares separately it is essential to have two regression lines. The regres- 
sion line of оп X is drawn in such a way that it minimises total of 
squares of the vertical deviations and the regression line of X on Y 
minimises the total squares of the horizontal deviations. "This can be best 
appreciated with the help of the following example. 


Height of father 
(inches) 65 63 67 64 68 62 70 66 68 67 69 71 


` Height of son 


(inches) 68 66 68 65 69 66 68 65 71 67 68 70 


The two regression equations corresponding to these variables аге: 
X=—3'38+1-036¥ с.) 
Y=35'82+0°476X © (i5) 
By assuming any values of Y we can find out corresponding values 
of X from Eq. (i). 
For example if Y=65, X would be —3:38 4-1"036(65) —63'96 
Similarly if У=70, X would be —3:384-1:036(70) —69:14. 


We can plot these points on the graph and obtain regression line of 
X on Y. 


Similarly by assigning any values to X in Eq. (ii) we can obtain 
corresponding values of Y. Thus if.X —63, Y would be 


+ 35°82 +°476(63)=65°808 ог 65'81 
and for X —70, Y would be 35°82 +°476(70) —69:14 
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The graph of original data and these lines would be as follows : | 
Y 


e 
© 


REGRESSION 
LINE OF YON X 


a 
© 


T ОЕ. SON (INCHES) 
ч 


HEIGH 
е O9 a 
л е 


8 2 


62 63 64 65 66 67 68 69 70 7 
HEIGHT OF FATHER (INCHES) 


& 


HEIGHT OF SON (INCHES) 


62 63 64 65 66 67 68 69 70 7 m 
HEIGHT OF FATHER (INCHES) 
REGRESSION OF YON X E(Y-Y)? 15 MINIMUM 


S 66 
M 


$ 65 
БЛ X=a+by 


63 
a 


62 63 64 65 66 67 68 69 70 7f x 


HEIGHT OF FATHER (INCHES) 
aAr2nrccis AL Y AY * сг M. те MINIMIIM 
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Regression Equations 


Regression equations also known as estimating equations are alge- 
braic expressions of the regression lines. Since there are two regression 
lines, there are two regression equations—the regression equation of X on 
Y is used to describe the variations in the values of X for given changes 
in Y and the regression equation of Y on X is used to describe the varia- 
tion in the values of Y for given changes in X. 


Regression Equation of Y on X 
The regression equation of Y on X is expressed as follows : 
Ye=a+bX 


In this equation a and b are constants (fixed numerical values) which 
determine the position of the line completely. These constants are called 
the parameters of the line. If the value of either or both of them is 
changed, another line is determined. The parameter ‘a’ determines the 
level of the fitted line (i.e., the distance of the line directly above or below 
the origin). The parameter `0” determines the slope of the line, t.e., the 
change in Ё per unit change in X. ‘Lhe symbol У, stands for the value of 
Y computed from the relationship for a given X. 


If the values of the constants ‘a’ and ‘b’ are obtained, the line is com- 
pletely determined. But the question is how to obtain these values. The 
answer is provided by the method of Least Squares which states that the 
line should be drawn through the plotted points in such a manner that the 
sum of the squares of the deviations of the actual Y values trom the com- 
puted Y values is the least, or in other words, in order to obtain a line 
which fits the points best Z(Y — У)? shou! 1 be minimum. Such a line is 
known as the line of ‘best fit’. 


A straight line fitted by least squares has the following characteristics : 
1. It gives the best fit to the data in the sense that it makes the sum 


of the squared deviations from the line, Z(Y—Y,*, smaller than they 
would be from any other straight line. This property accounts for the 


name ‘Least Squares’. 

2. The deviations above the line equal those below the line, on the 
average. This means that the total of the positive and negative deviations 
is zero, ог Z(! —Y-)=0. 

3. The siraight line goes through the overall mean ofthe data 
(X, Y). 


4. Whenthe data represent a sample from alarger population, 
the least squares line is a 'best estimate of the population regression 


line. 


With a little algebra and differential calculus it can be shown that 
the following two equations, if solved simultanvously, will yield values of 


S8ME—10°77°23 
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the parameters a and b such that the least squares requirement is 
fulfilled* : 
2Y=Na+bzX 
EXY=alX+b2X* 

These equations are usually called the normal equations. In the 
equations 2X, ZY, ZXY, УХ? indicate totals which are computed from 
the observed pairs of values of two variables X and Y to which the least 
squares estimating line is to be fitted and № is the number of observed 


pairs of values. 
Regression Equation of X on Y 


The regression of X on Y is expressed as follows : 
X,=a+bY 
To. determine the. values of a and b the following two normal 
equations are to be solved simultaneously. 


2X=Na+bzyY 
ZXY-aZY-bZy:* 
Tilustration 1. From the following data obtain the two regression equations : 
x 6 2 10 4 8 
Y 9 11 5 8 7 


(LC.W.A., 1974) 
Solution: OBTAINING REGRESSION EQUATIONS 


x Y ХҮ x y 

6 9 54 36 81 

2 i 22 4 121 
10 5 50 100 25 

м 8 32 16 64 

8 7 56 64 49 

3x=30 rY=40 EXY=214 EX*=220 ZY?=340 
ШР a Rte ee eee ЭРЕ 


Regression equation of Y on X: Y,=a+bX 
To determine the values of a and b the following two norma) equations are to 


be solved 
XY-Na--bIX 
ZXY-aZX--biX? 
Substituting the values 40=Sa+30b «@ 
214=30a+2205 D 
*We require WeZ(Y—Y.;)'—Z(Y—a—bX) to be minimum. 
Differentiating W with respect to a and b, we have 
iv 
"As =~ 2HY-a-bX) 
and w =—22X(¥—a—bX)——28(X¥—aX)—bX* 


For W to be minimum t and H must both equal 0, which they will do 


when Z(Y—a—bX)!—-0 and IXY—X—5X7)-0 
Le, when XY-NadbIX 
and ZXY-olX-4bIX* 
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Multiplying equation (i) by 6 240—302 4-1805 (Hl) 
214—50a--220b (5) 
Deducting equation (iv) from (iii) —40b=+26 or b=—0'65 
» Substituting the value of 5 in equation @) 


40=5а+30(—0`65) or 5а1==40+19°5==59'5 ог а==11°9 
Putting the values of a and b in the equation, the regression of Y on X is 
Y=11'9-0°65X 
Regression line of Yon Y: X,=a+bY 
and the two normal.equations are : 


EX=Na+bEY 
EXY=alY+bEY* 
30—5a+40b 0) 
214-=40а+340Ь ОШ) 
Multiplying equation (i) by 8: 240=40a+320b » (Hil) 
2144-40a 4-340b 0) 
—20b—26 or b=—1'3 


Substituting the value of b in equation (i): 30=5a+40(—1°3) 
5a—30--52—82 .*. а=16:4 


Putting the values of a and 5 in the equation, the regression line of X on Y is 
X-164—r3Y 


Deviations taken from Arithmetic Means of X and Y 


The above method of finding out regression equations is tedious. 
The calculations can very much be simplified if instead of dealing with 
the actual values of X and Y we take the deviations of X and Y series from 
their respective means. In such a case the two regression equations are 
written as follows : 


(i) Regression equation of X on Y 


X~R=r—* (y—7) 
Gy 
X is the mean of X series 
Y is the mean of Y series 


r = is known as the regression coefficient of X on Y. 
v 
The regression coefficient of X on Y is denoted by the symbol bzy or b. 
It measures the change in X corresponding to a unit change in Y. When 
deviations are taken from the means of X and Y the regression coefficient 
of X on Y is obtained as follows : 


o,* _ Zay 
e, sy? 
Instead of finding out the value of correlation coefficient, os, oy, etc. we 


can find the value of, regression coefficient by calculating Улу and Ху? 
and dividing the former by the latter. 


b, огт 


бкр су oer фу 
Oy Nogsy © oy, Sy? 
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(ii) Regression equation of Y on X 
y—-Yer > (x—X) 
Sz 
r а is the regression coefficient ofY on X. Itisdenoted by bys or ba 
It measures the change in Y corresponding to a unit change in X. When 


deviations are taken from actual means the regression coefficient of Y on x 
can be obtained as follows : 


It should be noted that the underroot of the product of two regression 
coefficients gives us the value of correlation coefficient. Symbolically : 


ra ba X Oye 
Proof 
в 
ber and bnt 
baxbn=r > жт no r= Abu X bos 
v с; 


The following points should be noted about the regression 
coefficients : 


1, Both the regression coefficients will have the same sign, б.е. 
either they will be positive or negative. It is never possible that one of 
the regression coefficients 13 negative and another positive. 


2. Since the value of the coefficient of correlation (r) cannot exceed 
one, one of the regression coefficients must be less than one or, in other 
words, both the regression coefficients cannot be greater than one. For 
example, if 5, — 12 and b,,7-1'4 the value of correlation coefficient would 
be V 12x 41:29 which is not possible. 


3. The coefficient of correlation will have the same sign as that of - 


regression coefficients, t-e. if regression coefficient have a negative sign f^ 


will also be negative and if regression. coefficients have a positive sign? _ 


would also be positive. For example, if 5,,— —U'8 and 5,,2—12, * 
would be V —08 X —12-—0:98 and not --0'98. : 
4. Since bey roe , we can find out any of the four values given 


the other three, For example, if we know that r=0°6, o,=4 and bay=0'8, 
we can find cy. 


МРС Ра 


P" 
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Substituting the given values 
06х4 RS 224.5 
су "NOB. . 
Illustration 2. From the data of illustration 1, calculate the regression equations 
taking deviation of items from the mean of X and Y series. 


Solution; CALCULATION OF REGRESSION EQUATIONS 


08= 


X (х-®) Y (-T) 

x xt У » xy 
[Ure ie iiia es с. 

6 0 0 9 1 1 0 
2 —4 16 п 3 9 -12 
10 4 16 5 =3 9 -12 
4 22 4 8 0 0 
8 2 4 7 - 1 -2 


eked nn 
EX=30 Ex=0 Ex*=40 EY=40 Zy=0 Zy*=20 Zxy=—26 


Regression Equation of X on Y : xx arc Y- T) 
F 
ча ‚ху, 726 1. 
fos = zy = 0 13 
30 "Did x 
Х- 5-6 Y-cp- 
X—6-—13(Y—8)- —1:3У+10`4 


Hence 
X=—-13Y+164 ог X-164—13Y 


Regression Equation of Y on X : Y— Per (x-8) 


y —8e—0:65 (X—6)=—0'65X+3'9 
Y=—O'65X+11'9 ог Y=11'9—0'65X 


Thus we find that the answer is the same as obtained earlier. However, tho 
calculations are very much simplified without the use of normal equations. 


Deviations taken from Assumed Means 


When actual means of X and Y variables are in fractions the calcu- 
lations сап be simplified by taking the deviations from the assumed means. 
When deviations are taken from assumed means the entire procedure of 
finding regression equations remains the same—the only difference is that 
instead of taking deviations from actual means, We take the deviations from 
assumed means. The two regression equations are : 


X-Xenm Y-T) 


The value of 7 = will now be obtained as follows : 
y 


" Saab DA 
КТ РТЫ ЕЗ 
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d, —(X — 4) and d,— (Y — A) 
Similarly the regression equation of Y on X is 


Y-7=r 2 (X—X) 


IX Edy 
„9 244,— N 
a 
Oz х NEL 
Tt should be noted that in both the cases the numerator is the same, the 
only difference isin the denominator. When the ion coefficients 


are calculated from correlation table their values are obtained as follows : 


Zfdedy— fi neler 


$a HW x 
Os Zfd,— (2fdy) ty 
$, class interval of X variable ; and 
iy=class interval of Y variable 
Zfd,x E. 
gua, XE 


в 
Similarly, т -— — ——dQGhnu3 * -= 
С, а.) 
z хја2— ( ы te 
Asis clear the formulae are the same—the only difference is that ina 
correlation table we are given frequencies also ; we have multiplied every 
value by f. 
tion 4. From the data of illustration 1, obtain regression equations 
taking deviations from 5 in case of X and 7 in case of Y: 
Solution: CALCULATION OF REGRESSION EQUATIONS 


x (X—5) Y (Y-7) 
ds dg dy dy? 4.4, 
6 +1 1 9 +2 4° +2 
—3 9 11 +4 16 —12 
10 +5 25 9 —2 4 —10 
-1 1 8 +1 1 -1 
8 +3 9 7 0 0 0 


ZXe3) Id,-45 54245  IY-40 Edy=S  ldy-25 4—21 


Regression equation of X on Y : Y- Riba- Y) 
Zd,d,— 40524) —2 o? 


ia. 1 АЕА 
т Id: (Gd, = 25 (5) 20 
a ae 


30 
x-3-65 7-5-8 


* Regression coefficients are independent of change of origin but not of scale. 
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So the regression equation is 
X—6-—13XY—8) —13Y-104 or X-—164—U3Y 

Regression equation of Y on X : y-Y-b4X-X) 
Ededy— (Pde) x Cs) sag yi oa 


—26 ү 
-— =—065 


ieee = — 
те a UE. as- O 


So the regression equation is 
Y—8=—0'65 (Х—6)=—0'65Х+3'9 or Y=11'9—0'65X. 
Graphing Regression Lines. It is quite easy tó graph the 
regression lines once they have been computed. All one has to do is: 
(a) Choose any two values (preferably well apart) for the unknown 
variable on the right-hand side of the equation, 
m (b) compute the other variable, 
(c) plot the two pairs of values, and 
(d) draw a straight line through the plotted points. 
Illustration 5; Show graphically the regression equations of Illustration 3. 
Solution. (a) Regression Line of Y on X[Y2119—0'65 X] 
(i) Let X2, Y-211:9—0:65 (2)211:9—1:30—10:6, 
(ii) Let X—10, Ү=11'9—0`65х 10=5°4 


Я These points and the regression line through them are shown.in the following 
sraph : 


(b) Regression line of X onY (X—164—1:3 Y) 


(i) Let , Ye10 

ne Хе164—1°3 (10) -164—13—3'4 
(ii) Let Y=6 

i X«164—1:3(6) 2164—7:8—8'6. 


These points and the regression line through them are shown in the graph 
above. 


Е Thus the value of regression coefficient comes out to be the same. 
Iitustration 6, Given the bivariate data : 


P d 1 5 3 2 H H 7 3 
Y 6 H 0 0 1 2 1 5 
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(a) Fit a regression line of Y on X and thence predict Y if X—5]. 
(b) Fit a regression line of X on Y and thence predict X if Y—2'5. 
(c) Calculate Kari Pearson’s correlation coefüicient. 
(M. Com., Belki, 1974) 


Solution : 
P d (x-2) (Y-2) 

z d Ү Y а? dod, 
1 -2 4 6 +4 16 —8 
5 +2 4 1 -1 1 -2 
3 o 0 0 -2 4 0 
2 -1 1 0 -2 4 2 
1 -2 4 1 -1 1 2 
H -2 4 2 0 0 о 
7 +4 16 1 =) 1 —4 
3 0 0 5 +3 9 0 


IX-23  Xd,——1  Id,—33  3Y-16  Idj-0 Ēd; =36 4й,=—10 


Regression equation of X on Y : X—X —6,, (У-Ү) 


"PE da) 24). мр £z pen eee a 
MIS EMMANUEL 36 
Bde 36-5- 


#--Ё-—>лз,7— M ua, bay -0:218 
Substituting the values in the equation, 
X—2:875— —0'278(Y—2)= —0:278 Y4-0556 or ¥=3'431—0°278Y 
If Y= 25, X is equal to 3:431 —(0'278 x2:5) 3:431 —695--27736 
Regression equation of Y on X : y-Y —by( X—XÀ) ) 
“Ж Bp (d) у (—10)(0) 


8 —10 
т SY aA MERCI САДЕ ЗЛЕ MOON NE 
pi sdp (242 m Dé 3875 
Y—2——0:304X—2:875), =- 0:204X--0:874 ог Y«2:874—0:304X 
if X=5, Y is equal to 2:874—(0:304 x 5)—2:874—1:52—1:354 


r= bay х byz=V (0:278) (—0:304)— —0291. 


. Illustration 6. On the basis of figures recorded below for ‘Supply’ and ‘Price! 
for nine years, build a regression of ‘Price’ on ‘Supply’. Calculate, from the equation 
established, the most likely price when supply=90, 


Year 1969 1969 1970 1971 1972 1973 1974 1975 1976 
Supply 80 — $2 86 э! з 85 89 96 93: ЧА 
Price 145 140 130 14 133 17 10 10 116 


Solution. Let price be denoted by Y and supply by X. The regression equation 
of Yon Xis: 


z.5 
yoYsr (Х—Х) 
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ee —— 
Year Supply (X—9» Price (Y—127) 
x ds d Y dy? dj dady 
1968 80 -10 100 145 +18 324 —180 
196% 82 E ò 140 413 369 —104 
1970 86 -4 16 130 +3 9 -12 
1971 91 +1 Levy 424 -3 9 -3 
1872 83 -7 49 133 46 36 —42 
1973 85 —5 25 127 0 0 0 
1974 89 -1 1 120 -1 49 +7 
1975 96 +6 36 110 -17 289 —102 
1976 93 43 9 116 -i 121 -3 
N«9 20—185 Edg=—25 Х432=:301 ZY ха zd Xd, 
wmo a6 |= = +2 1006 --4 
їй, х Ed, (—25\2) 
dy ME, -z 
Tio т айнан ИЕ 
Кыр! чей _& Ж) 
z N 301 9 
—46344 46344 : 
"UW-e5 — 231° о: 
> #780-872; = MAS onn 


Ү-1272=—201 (Х—87'2)= —2:01X 4- 17527 or Y= —2.01X--30247 
Where X is 90, Y would be Y= —2:01 (90) +302'47== 121°57 
Thus the most likely price when supply is 90 is Rs. 12157. 
ога, Given that the means of X and Yare 65 and 67, their standard 
deviations are 2:5 and 3°5 respectively, and the coefficient of correlation between them 


is 08. 
( Write down the two regression lines. 
(if) Obtain the best estimate of X when Y=70 
(il) Using fhe estimated value of X as the given value of X,estimate the corres- 
ponding value of Y. (B.A., Madurai, 1974) 


Solution : 
Regression Line of Y on X: Y-Y art Ж), 
LJ 
Y-61, HOS, oy 35, 0,725, г=0'8 
Y-61-0825. (Х—65у= 2 (X= 65) 112X 728 or YAT 12758 
- E c V ES wo 
Regression Line of Xon Y : X-X “Се, (Y-F) =0 8x (Y-67) 
=0°571(Y—67)=0°571 y-38257 Х=0°571 y-4-26743 
ү 0 Best estimate of X when Y=70 can be obtained from the regression equation 
icit X=0571 (10)+26°743= 39'97+26°743= 66713 
(ii) When X=66713, Y -112 (66:713) 5897472 $8 68:92, 
Tilustration 9. The correlation coefficient between two — Xand Y is 
0:6. If $,—-1:50, 2:00 (S denotes standard deviation), X-—10 and X20. find the 
equation of the » Semion lines of X on Y and Y on X. (BA, Madras, 1974) 
Solution. Regression equation of X on Y is given by ; 
Li 


xir -P 


, " 4 í 
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(a) Fit a regression line of Y on X and thence predict Y if X5]. 
(0) Fit a regression line of X on Y and thence predict Y if Ү=2:5. 
(c) Caicuiate Kari Pearson's correlation coefficient, (M. Com., Dehi, 1974) 


Solution : ^ 
самае TRIER IS | 2: RAT, 
x (х—5) (У—2) 
4. dà Y dy „* dady 
AAE MUR = Н ӨТ OPE ДИ, MN 
1 -2 4 6 +4 16 —8 
5 42 4 1 -1 1 -2 
3 0 0 0 -2 4 0 
2 -1 1 0 -2 4 2 
1 -2 4 1 -l 1 2 
} -2 4 2 0 n о 
ve +4 16 1 -] 1 4 
5 0 0 5 +3 9 о 
UTCUAT en a ——............_...—. S 
fX—23  Zdo——1] 242-33  3Y-16  xd-0 Zd =36 — Zdd,-—10 i 
СЗ MER Ө 
Regression equation of X on Y : X X =b, (У-Ү) | 
Edy — 242) X (Ete) (1000 | 
юе „$$ uL 710, nn 
aid хз 5% 5 08 36 
REGS UE i ee 
Xa P= 2915, To 16-2, bye 0-278 | 
2 | 
Substituting the values in the equation, | 
X—2:875— —0°278(Ү—2)=—0°278 Y-4-0:556 or X=3'431—0°278Y | 
If Y=2'5, X is equal to 3°431—(0'278 x2:5)—3:431—:695—2:736 
Regression equation of Y on X: Y— =by (Х. 9 ) 
3dd,— Dips (Edy) ioe (100 w 
pe zdp- зу (Сї ра 0» 
Жл N 8 
Y—2— —0:304(Xx —2:875), =- 0::04X--0:874 ог Ye2874—0:304X 
If X=5, Yis equal to 2:874—(0:304 x 5)—2:874—1:52—1:354 
т=М bay Xbys=V (50278) (—0:304) —0:291. 
7 Mlustration 6. On ihe basis of figures recorded below for ‘Supply’ and ‘Price’ 
for nine years, build a regression of ‘Price’ оп ‘Supply’. Calculate, from the equation 
established, the most likely price when supply=90, 
Year 1968 1969 1970 1971 1972 1973 1974 1975 1976 
Supply 80 82 86 91 83 85 89 96 93 b. 
Price 145 140 130 124 133 127 120 


110 16. - 


Solution. Let price be denoted by Y and supply by X. The regression equation 
of Yon Xis: 


JN Уа, 
EE o-» 
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Year Supply (X—9» Price (Ү—127) | 
x ds d$ Y. dy * as did, 
1968 80 -—10 100 145 +18 324 —180 
m 82 -8 64 240 +13 169 —104 
190 86 -4 16 230 +3 9 -12 
n 91 +1 1 124 -3 9 —3 
194 83 =7 49 133 +6 36 —42 
1974 85 -5 25 127 0 0 0 
M 89 =k 1 120 aT 49 +7 
bee 96 +6 36 110 —17 289 —102 
vL ЙЧ 93 +3 9 116 =її 121 -33 
N=9 ZEX=785 Zd,-—25 Id,—301 ZY Edy Edy? 4,4, 
=1,145 =+2  -1006 =—469 
заду 203536. ag 12250 
ГРЕЕНЕ equ cT 
nU [^u ap e 


—463:44 46344 in, 
wi c as 
2 yoy 872 SY 18-1272 
Y—127:2— —2:01 (X—87:2)— —2:01X - 17527 or Y=—2.01X+302'47 
Where X is 90, Y would be Y— —2'01 (90) 4-302:47—121:57 
Thus the most likely price when supply is 90 is Rs. 12157. 
.  Müestration'8, Given that the means of Xand Yare 65 and 67, their standard 
Upon are 2'5 and 3°5 respectively, and the coefficient of correlation between them 
s 0'8. 
( Write down the two regression lines. 
(il) Obtain the best estimate of X when Y—70 


‚(у Using the estimated value of X as the given value of X,estimate the corres- 
ponding value of Y. (B.A., Madurai, 1974) 


Solution : 
Regression Line of Y on X : Yy =r A): 
z 


Y—61, X=65, оу=3°5, 0,—2:5, r-0'8 
Y-61-0835. (X—65)- Y12 (X—65)-1:12X—77'8 or Y-r12x—58 
5 


А EN EPS Men 25 y 
Regression Line of X on Y : X— X aay (Y-Y) =0°8 35 (Ү—67) 


=0°57(У—67)=0°571 Y—38257 X—0:571 Ү+26°743 
oer o Best estimate of X when Y—70 can be obtained from the regression equation 
on У. $ 
X=0°571 (70)2-26:743—3997-- 26°743= 66713 
(Gi) When X—66713, Y 2112 (66:713)—5:8—74772—5 8=68'92. 
Illustration 9. The correlation coefficient between two variables X and Y is 


06. If S,—1:50, Sy=2-00 (S denotes standard deviation), ¥=10 and Y=20. find the 
equation of the regression lines of X on Y and Y on x. (В.А., "Madras, 1974) 


Solution, Regression equation of X on Y is given by 
— Sz T 
X-X-—r s (Ү—?) 
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We are given ‚_— 10. r=06, Ї—20, 5,- 15, 5—2 
Substituting the values х—10—0-6-1,5- (Y—20) 
X—10—045Y—9 or X—14045Y 

Regression equation of Y on X is given by YT =e (X—X) 


Substituting the values 200-62 (x10) 


=0'8(X—10) —0:8X—8 or Y-1240:8X 
Thus the two regression equations аге: 


Illustration 9. For 50 students of a class the regression equation of marks in 
Statistics (X) on the marks in Accountancy (Ү) is 3Y—5X--180-0. The mean marks 
of Accountancy is 44 and variance of marks in Statistics is 9/16th of the variance of 
marks in Accountancy, Find the mean marks of Statistics and the coefficient of 


(Advanced Business Statistics, Rajasthan, 1974) 
Solution. We are given 
3Y—SX+180—0 ог 3¥-+180=5x 
X represents marks in Statistics and Y, marks in Accountancy, When Y=44, X will 
be given by 


5X=(3)(44)-+180 ; 5X=132+180 or Y= 2624 


Hence the mean marks іп Statistics are 62-4, 
For calculating coefficient of correlation, we know that 
Oz . 
beyer s 
Regression coefficient of X on Y from the given equation is 
5Х»3Ү+180 ог X-0'6Y--36 


n bz, 0:6 
r da VE Biven 
°% \/16 
" e 
x TEAS, р 3 
^ 0'6-r Vie or 0б=г 7 
Hence 3r22:4 1 r—4-08. 


А Illustration 10. The following scores were worked out from a test in Mathe- 
matics and English in an annual examination : 


Scores in Mathematics Scores in: English 
(X) (Ү) 
Mean 39:5 4TS 
Standard Deviation 10:8 168 
r—40:42 
Find both the regression equations. Using these regressions estimate the 
value of Y for ¥=50 and the value of X for ¥=30, (M. Com., Dethi, 1974) 


Solution : 


Regression equation of X on Y : X. ark. (Y- T, ) 
xe v 


Tears, X-395, 15.1942, 924—108, 0,=16'8 
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Substituting these values X—39:5—0:42 pE (Y—47:5)-0:27(Y—47:5) 
=0°27Y —12:82 or X=0:27Y¥ —12:82-39520:27Y 4-26:68 
When Y=30, the value of X is (0°27x 30+26°68)=34'78. 


Regression equation of Y on X : y-Y-,^- (ХХ), where 
. 


X—395, Y 2AT5, r90:42, c, 2168, 0,108 
Substituting these values. Y—47:5«042455. (X-395) 
=0°653(X—39'5)—=0°653(A—25'79) 
Y=0°653X—25'79+47°5=0°653X+21'71 
When X=50, the value of Y is (0°653x 5)+21:71) 
Ү=32:65+21`71=54'36 
Thus the regression equations аге 


X=0°27Y+26°68 
Y-—0:653X 42171 
The value of X when Y-30 is 3478 
and the value of Y when X=50 is 54:36. 


Regression Equations in case of Correlation Table 

When we are given a correlation table and we are to find out regres- 
sion equations, the procedure will remain the same as discussed above. 
However, for finding the regression з equation of Y on X and X on Y the 
convenient form will be Y—Y25,(X—X) and X—-X=b.,(Y—7). It 
may be noted that the regression coefficients are independent of origin 
but not scale and hence necessary adjustment must be made. The 
following example will illustrate the procedure : 


Illustration 11. Obtain the regression equation of Y on X and X on Y and the 
value of r from the following table giving the marks in Accountancy and Statistics : 


MARKS IN ACCOUNTANCY 


Marks in 
tatistics 
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Solution: COMPUTATIONS REQUIRED FOR FINDING THE 
REGRESSION EQUATION 


iRegression equation of Y on X: ҮҮ re (ХХ) 
Y 


___*fde x Хуа, 42x12 
RUN, coo RES: cad luv. 10 
fec Nana Cua. dec п» 


60 37-84 286 


174 72-294” 226 067 


хуа, ` 
Tost x C, where 4—25, Bfdy=12, fi 60, C=10 


CO eos EA 
25+ o *10=27 


Ef. 
X-Ax Tm х С, where 4—20, Zfd,—42, N=60, C=10 


n0 X0 210-1 


Y-21-06(X-27)-067X—1809 or Y=0'61X+8'91 А 
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Regression Equation of X on y x-X-r 2. (Ү- Y) 
v 
_ йсй, _ _ 42х12 
0) а Е 5—6 10: 
2/4„,\° E 12) Г] 
ву zapi ү i, p- 02% 1 
37—94 _286 умру 


—79—24  6r6 
X—21-042xY-27)-0:423Y—10421, or X--0:423Y + 15:58 
Correlation Coefficient : p= V Erg X bya 07823 x 061 = 0:532. 


Standard Error of Estimate* 


With the help of regression equations perfect prediction із practi- 
cally impossible. For example, the revenue for the year from gasoline sales 
(Y) based on automobile registration (X) asof а certain date would no 
doubt be approximated fairly closely, but the prediction would not be 
exact to the nearest rupee nor probably to the nearest thousand rupees. 
What is needed, then, is a measure which would indicate how precise the 
prediction of Y is, based on X or, conversely, how inaccurate the predic- 
tion might be. This measure is called the standard error of estimate. The 
standard error of estimate, symbolised by Sy. is the same concept as the 
standard deviation discussed in Chapter 8. The standard deviation mea- 
sures the dispersion about an average, such as the mean. The standard 
error of estimate measures the dispersion about an average line, called 
the regression line. The formula for calculating the standard error of 


estimate is : 
(У-Ү. 
8, id NP Н 
where S,.,—the standard error of regression of Y values from У,. 


This formula is not convenient from the computational point of view 
because it requires the computation of (F—Y,. A more convenient 


formula is 
J ZY:—aZY-0EXY 
Speel —Nz3 


The standard error of regression of X values from X, is 
=X.) Zxi-aZX—bEXY 
PEEL rU or samy cael ets 


S,y,-the standard error of regression of X values from Xe. 


x * In the formula for calculating standard error of estimate, the sum of squares 
is divided by N—2, The usual explanation given for this division by N—2 is that the 
two constants a and b were calculated on the basis of original data and we, thus, lose 
two degrees of freedom. То justify this theoretically is beyond the scope of this text. 


B doe à 
can be assigned at will without violating any of the restrictions imposed. For details 


> 


please refer to Chapter on ‘Tests of Significance’. 
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andard error of estimate measures the accuracy of the estimated 
Gai The smaller the value of standard error of estimate, the closer will 
be the dots to the regression line and the better the estimates based on the 
equation for this line. If standard error of estimate is zero, then there is no 
variation about the line and the correlation will be perfect. Thus with 
the help of standard error of estimate it is possible for us to ascertain how 
good and representative the regression line is as а description of the 
average relationship between two series. 


Illustration 12. Given the following data : 
x 6 2 10 4 8 
Y 9 п 5 8 7 


Find the two regression equations and calculate the standard error of the 
estimate (Sy. and Sz.y). 


Solution. Refer to Illustration 2, page Е-11°9, the two regression equations 
are: 
Y-119—0'65X . and X2164—13Y 
From the regression ЫП of Y on X for various values of X we can find out 
and 


the corresponding Y values, тот the equation of X on Y we can find out > 
These values are as follows : 


6 9 80 47 1:00 1-69 

2 n 106 21 0:16 0:01 

10 5 54 59 0:16 ош 

4 8 93 60 169 4:00 

8 7 67 T3 0:09 0:49 
УХ=30 EY=40  XY,-4) EX,—30 XY—-Y—31 5(Х— Х)%=6-20 


2-2 "4$ 


5, m En, 52 


X(Y-Y, 31 
Sy ү па у= VTO =1-01 


MISCELLANEOUS ILLUSTRATIONS 
Illustration 13. From the data given below find : 
(a) the two regression equations, 


Marks in Economics : 25 28 35 32 з 36 29 з 3 3 
Marks in Statistics : 43 46 49 4 36 32 з 30 33 39 


(B. Com.; Bangalore, 1973) 


A 
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Solution: COMPUTATION OF REGRESSION EQUA’ 
CORRELATION COEFFICIENT ^ ^ LIONS AND 


Marks in Ж Marks їп ч 
Economics (X—X) Statistics (Y-Y) 
X x xt T y у? xy 
25 —7 49 43 +5 25 —35 
28 -4 16 46 +8 64 BEDS 
35 +3 9 49 +11 121 +33 
32 0 0 41 +3 9 0 
31 =1 1 36 -2 4 +2 
36 +4 16 32 —6 36 —24 
29 - 9 31 —7 49 +21 
38 +6 36 30 —8 64 —48 
34 42 4 33 -5 25 —10 
32 0 0 39 +1 1 0 
®Х=320 Zx=0 Хл?=140 ZY—380 Zy=0 >у%—=398 Zxy-—92 
Regression equation of X on Y : 
Xx-X- b, Y- T) 
Уху 93 y 
Ваг 398 = —0°234 
237711320 | EY _ 380 _ 
GE BRE TRA na RS RARE. 
Substituting the values, X—32- - 0:234(Y— 38)= —0'234Y 4-8:892 
or X—40:892—0:234 Y. 
Regression equation of Y on X : Y-Y- by, (Х-Х); 
2 Хху 93 . _ 9-664; 2-32, 7-38, b=—0" 
by S40. LO 664 ; X—32, Y=38, b= —0:66 


Y—38=—0 664 (X—32) ог; ——0'664X-- 21:248 
Y-59:248—0 664X ` 
r= V bay Xbyz = V —0:234 x —0:664— —07394. 


Since both the regression coefficients are negative, the value of r must also be 
negative. > 


Likely marks in Statistics when marks in Economics are 30, 
у= —0:664X--59:248 
When X=30; Y—(—0:664x30)4-59:248—39:328 ог 39 


Illustration 14. Given the following values : 


Mean S.D. 
Yield of wheat (kg. unit area) 10 8 
Annual Rainfall (inches) 8 2 
Correlation Coefficient 05 
Estimate the yield when rainfall is 9 inches. Q.C.W.A., Jan., 1972) 


Solution. Let yield be denoted by Y and rainfall by X. We have to estimate 
the еа when rainfall is 9 inches which will be given by the regression equation of Y 
on X. 


Regression equation of Y on X is given by 
Р yo 2 are (ХХ), where r=0°5, sy=8, e,-2, X-8, Y=10 
2 
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Substituting the values Ү—10=0°5-#-(Х—8)=2(Х—8у=2Х-—16 Ү=2Х—6 


when ( X=9, Y=(2x9)—6=12 
Thus the probable yield when rainfall is 9 inches--12 kg. (per unit area). 
Illustration 15. Given the following data : Variance of Y—9 
Regression equation : 4X—5Y--33—0 ;20X—9Y —107—0. 

Find (i) the mean values of X and Y. 

(ii) the coefficient of correlation between X and Y. 

(iii) the standard deviation of Y: 

(8. Com., Poona, 1970 ; B. Com., Bombay, 1974) 

Solution. (i) Calculating mean values of X and "d 


We are given 4X—5Y-- —32 x) 
20Х—9Ү=107 (ш) 

Multiplying equation (i) by 5 to eliminate X for finding out the mean of Y, 
" 20X—25Y — —165 E) 
20X—9Y--107 E) 


Subtracting cquation (iv) from (iii), —16Y--—272 or Y=17 

Substituting the value of Y in equation (7), 4X-(5x17)— —33 
4X--—334.85—52 or X—13 

Thus the mean values of X and Y are 15 and 17 respectively, 


(ii) For finding out the value of Coefficient of correlation we should know the 
two regression coefficients. This we can do from the regression equation given to us. 
Since we do not know as to which of the two regression equations is the regression 
bus of X on Y, we make an assumption that equation (/) is regression equation of 

on Y, 


4X—5Y-—33 ог 4Y-—3345Y 


5 
no bays vu 


From equation (її) we can find out regression coefficients of Y on А 
20X—9Y—107 or —9Y¥=20x—107 


a ъ= g 


But this not possible because both the regression coefücients are greater than 1. 


our assumption is wrong. Treating equation (i) as regression equation of Y on X, 
and (i) as regression equation of X on Y, we get A i 


4 
byez O8 fie 5Y—4X-4-33] 
9 
bay= 79 7045 L^  20X—-10749Y] 


r=V bey Xbyz =V 035 X0 $22 4/0:36—0*6 
(iii) Calculating standard deviation of y 


0°45cy=1'8 or су=4 


Thus the mean values of X and Y are 13 and 16 respecti t ient of 
correlation is +0°6 and standard deviation of Y is 4. cu gu uem 
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Illustration 16. Two random variables have the regressi i 
3X--2Y—26—0 and 6X--Y—31—0. Find the mean value and the posu nel ise 


cient between X and Y. 


1f the variance of X is 25, find оу from the data given above. 


(M. Com., Delhi, 1969) 


Solution. (а) Mean values of X and Y 


3X+2Y=26 NO 
6X+Y=31 й) 
Multiplying equation (i) by 2 and eliminating X 
6X+4Y=52 (ШШ) 
6X+Y=31 wiv) 
Deducting the equation (iv) from (iti) 
3ү=21 or Y=7 


Substituting the value of Y in equation @) 
3X+2(7)=26 or 3Х=26—14 or Х=4 


Hence X—4 and Y=7 
(b) Correlation coefficient 


For determining the value of r, we must know the value of two regression 


coefficients. 


Let equation (i) be treated as regression of Y on X : 
3X42Y—26-0 or 2ү=26—3Х or Ү=13—1°5Х 


by, —l5 


Equation (ii) be treated as equation of X on Y 


6X+Y—31=0 or 6X—31—Y 


1 
LPS 


3L f 
or Xem 


1 


ЕЕ Ру E —-рх-Г5=—05 


2,?—25 (given) 
9,—5 


(c) Calculating оу 


су È 
ЖМ. BES 25 
byz urs 


Substituting the values—0°5 E S15 


Illustration 17. From the fo! 
tions : 
ZY-40, 
zY?—340, 


хХ=30, 
zx:—220, 


—0:59у=—7:5 ог oy=15. 


llowing information calculate the regression equa: 


ExXY=214 
N=5 


Also find the coefficient of correlation. 
Solution. Regression equation of X on Y 


(CX) b (Y- Y) 
du Xn M C та «o 
Ra Fea Tans 
ZXY: -NAY [when figures are given in original values] 
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214—(5х6х8) 214—240 -—26 
= 90—508): = 340—320 ^ 20 
X—6——103(Y—-8)- — U3Y4- 10:4 
X= —r3Y-164—164—1:3Y 
Regression equation of Y on X : 
Y-Y-b,(X—X, where Y-8,X-6 
b,- IXY-NXY 
ут русе ога 
..2M-(5x6x8) 214—240  —26 229565 
1 220—061 = 220—180 = 40 
Y—8— —0 65(X—6)— —0:65X-4.3:90 
Y——0:65X-11:9—11:9—0:65Y 
r— V ba, Xby2= V Г —0°6$—= /0:845-- —0:919 


Note. ` We can solve this question through the normal equations also but this 
is an easier procedure. 


[when figures are given in original values] 


Illustration 18. The following data gives the experience of machine operators 
and their performance ratings as given by the number of good parts turned out per 
100 pieces. 


Operators 1 2 3 4 5 6 7 8 
Experience (X) 16 12 18 4 3 10 5 12 
Performance 


Ratings (У) 87 88 89 68 78 80 75 83 


T cae. the regression line of performance ratings on experience and estimate 
the probable performance if an о; rator has 7 years experience. 
P A i $ д (M.B.A., Delhi 1973) 


Solution. Let performance ratings be denoted by Y and experience by X. We 
have to calculate the regression line of Y on Y, 


CALCULATING REGRESSION LINE OF Y ON X 


Experience (ХХ) Performance (Y-T) 
X=10 Ratings Y=81 
xs x xt y » у ху 
16 +6 36 87 +6 36 +36 
12 +2 4 88 +7 49 +14 
18 +8 64 89 +8 64 +64 
4 —6 36 68 -13 169 +78 
3 27 49 78 23 9 +21 
10 0 0 80 2; 1 0 
5 = 25 75 —6 36 +30 
12 42 4 83 42 4 +4 ` 
ZX=80  Zx=0  Zx?=218 хуб Zy—0 — Zy!—368 Zxy—247 
Regression equation of Y on X: Y-T. (ХХ) Кы" 
BOr A res оа. ЗЕСТ 
bne snz]; T= EXUE qu Ao 
i Y—81—1:1530Y—10).—1133X—11:33. or Y—1333X--69:67 
when X=7, Y will be 


Y—1:333(7) -69:67—9:331 3-69: 67— 79001 or 79 
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Thus the probable performance of an operator who has 7 years! experience—79 
good parts out of 100. 

Illustration 19. Assume that we conduct an experiment with eight fields planted 
to corn, four fields having no nitrcgen fertilizer and four fields having 80 pounds of nit- 
| rogen fertilizer. The resulting corn yields are shown in the table in bushels per acre : 


Field Nitrogen Corn Yields Field ' Nitrogen Corn Yield 


(pounds) (Bushels|acre) (pounds) (Bushels[acre) 
1 0 12 5 80 128 
2 0 36 6 80 112 
3 0 6 7 80 112* 
4- 0 18 8 80 16 


(а) Compute a linear regression equation by least squares. Ел,.. іп the mean- 
ing of regression equation in term: of fertilizer and corn yields. 
(5) Predict corn yield for a field treated with 60 pounds of fertilizer, 
(M.B.A., Delhi 1974) 


i Solution, Let nitrogen be denoted by X and corn yield by Y. 
We have to fit a regression of Y on X, i.e., 
Y-T-—b,4x—X) 
Field Nit X—A Corn Yield | (Y—4) 
ie Nitrogen b A 
X dz а^ x dy dy? did, 
D Pune MN vpn 
1 0 —40 1,600 12 —50 2,500 2,000 
2 0 —40 1,600 36 —26 676 1,040 
3 0 —40 1,600 6.  —56 3,136 2,240 
4 0 —40 1,600 18 —44 1,936 1,760 
5 80 +40 1,600 128 +66 4,356 2,640 
6 80 +40 1,600 112 ` +50 2,500 2,600 
7 80 +40 1,600 112 +50 2,500 2,000 
8 80 +40 1,600 76 +14 196 560 


aay ны кысы НИЕ ca 
EN ZX—320 4,=0 Zd EY=500  Zd,-4 Ра Edad, 


=12,800 -—17,800 14,240 
yY-Y-p4x—-X) 
y-XY- 30 "л БК 320 250 
TN a N78 
ZEd,x Ed, 0x4 
Ma MEG cov 11125 
bys (dy (08 702,80 ^ 
Zdj—— Nt 12800- —— 
Y¥—62°5=1°1125(¥—40) or —L1125X—445, Y-—11125X--18 
` when X=60, Y will be Р. 
ү=1"1125(60)+18=84°75. £ 


Thus the estimated corn yield for a field treated with 60 pounds of fertilizer is 
84°75 bushels per acre. И " 


S k^ 
Illustration 20. For 10 observations on price Аз 
data were obtained (in appropriate units) : * 
Zp—130, 25—220, Zp?—2,288 
25?—5,506, ZpS—3,467, N=10 


upply (S) the с" 
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Obtain the line of regression of S оп p and estimate the supply when the 
price їз 16 units, and find out the standard error of the estimate, 


(М.А. Econ., Delhi, 1972) 
Solution : 
Regression of S on p is given by : S=a+bp. 


The values of a and b can be determined by solving the following two normal 
equations : 


ZS—Na-4Xbp 
n ZpS—aZptbZp* 
Substituting the given values in the above equations 
220=10а+ 1305 e) 
3,467—130a-4-2,288b ++ (ii) 
Multiplying equation (i) by 13 
2,860— 130a-- 1,6905 
MOL PNE or оу; 
Substituting the value of b in equation (i) 
220—10a--130(1:015) 
10a+131'95=220 ог 10а=220—131:95 
vs a=8'805 
Thus S=8'805+1°015p 
When price is 16 units, the estimated supply will be : 
5—8:805-H-17015(16) —8:805-4-16:240 —25:045 


#—а5$—ЬУ, 
Standard Error of Estimate з (ёз: 02550205 


N-2 
J ! 5506—220(8:805) —3467(1 015) 5506—1937:1— 3519 
TANG il ATIS A TS 8 9 


49:9 
4 -g = 2497 


Illustration 20. For 50 students of a class the equation of marks in Statistics 
(X) on marks in Accountancy Y is 3Y—5X--180—0. The mean marks in Accountancy 
is 44 and variance of marks in Statistics is 9/16th of the variance of marks in Accoun- 
fancy. Find the mean marks in Statistics and the Coefficient of correlation between 
marks in two subjects. (T. D.C. Final, Raj., 1973) 


Solution : 
We are given : 
N=50, Regression equation of X on Y as 3Y—5X--180—0. 


T =44, og Toy? 


We have to find (i) X and (ii) r 
(i) Calculating X 
ЗҮ—5Х+180=0 or 3(44)—5X--180—0 
—5Х=—180—132=—312 
A X—624. Hence Y—624 
(ii) Calculating Coefficient of Correlation 
3Y—5X-+180=0 or —5Y—.180—3Y or X—36406Y 


REGRESSION ANALYSIS Е-11:25 


es bey=0'6 ог бшу=т-%&- ог 0:6=г 2 
Oy ву 

_0-6-5® es O P дле с 

f (s or ri (vex == = 036 Yo 


Now Let oy be =0'4 7. 0,2—43—16 


9 9 
But в?= dé of o= чє *16=9 
pn 936° _ 036x16 
ву? "n 9 
036x16 _ 06x4 
r= а =08 
Hence г=0:8. 


Illustration 21, The following data relate to the height (X) and weight (Y) of 
1,000 business executives : 


Mean value Standard deviation 
X=68'00 inches 22:50 inches 
Y=150'0 pounds Ү=20`0 pounds 


If the coefficient of correlation between X and Y—--0'6 estimate (a) the height 
ofan executi ve whose weight is 200 pounds and (5) the weight of an executive whose 
height is 60 inches. (M. Com., Nagpur, 1974) 

Solution : 

(a) The height of an executive whose weight is 200 lbs. shall be given by the 
tegress ion equation of X on Y. 


x-X-r 52 (ҮҮ), where X—68, r=0'6, 0425, oy=20, T—150 
у 


2. Х—68= 06 ZS (y—150)-0075(/—150) -0075Y.—11:25 


X-—0:675Y 4-68—11:25—0075Y-- 5675 
When Y (weight) is 200 pounds X (height) shall be : 
X—0:075(200) +56°75=15'0+56°75= 71: 75 inches. 
072 Thus the estimated height of an executive whose weight is 200 pounds is 71°75 
(b) The weight of an executive whose height is 60 inches shall be given by the 
regression equation of Y on X 


y-Y-, ^t qa-X) 


y—150- 06 20- (r-68)- 48069) -48X— 3264 


Y—48X—1764 
When X is 60, Y will be Y—4:8(60)—1764—111'6 Ibs. ; 
111-6 aaus the estimated weight of an executive whose height is 60 inches is 
s. 
.. Illustration 22. Following is the distribution of students according to thei 
height and weight : р 


Height in inches Weight in Ibs. 
S 90--100 реи ы пош 1202020 


50—55 4 

55—60 6 10 7 4 
60—65 6 12 10 d 
65—70 3 8 6 3 
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Calculate (i) the two coefficients of regression, and (ii) obtain the two regression 
equations. «M. Com., Delhi, 1973) 


Solution. Let height be denoted by X and weight by Y. 


COMPUTATIONS REQUIRED FOR FINDING THE 
REGRESSION EQUATIONS 


| 
^ 


Sur: RUN TS 

f EZ fd: | fdody 
| А с s uem 
50-55 25 18 11814188) 13 
ај 57:5 27 | lo |o 
о-в 62:5 35 | 35 | 35 |17 
sno 675 


| zfds | зуд |B fed, 
00 =57 | —133|- —20 


Regression Coefficients : 
Minis ida Bly 


R si cient of Yon X:by,m — У Jv. 
*gression Coefficient о! m D GR x i 
Zfdzdy=—26, 5а/„=57, Bfdy=—59, N=100, 2/4,2—133, is=5, iy=10 
r —26—-С<59Ў)67)_ 
(um 100 10 —2643563 10 763 0 5 
133.62: 37 13-3248 5 — qs х-у =+ 
73100 


Regression Coefficient of х onY 


Sled, f = e SNS Н 
Бау х5 
2... (fd, іу —— (—59)2 1 
хуа, SW лач 
Regression Eauations 


- Regression equation of Yon X : Y. Y— bys (x-X) 
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Y =A+ Ms xC-115— 22 x10—109:1 


100 
DOULEUR. CSI. хб асан 
Аъ “Ge хС=515+ -уу-Х5=57'5++285=60°35 


Y—109:1—0:152(X—60:35)—0:152X—9:17 or Y-0:152X-49973) 


Regression equation of X on Y : X—Xebz,(Y —Y) 
X—60:35—0:041(Y—109:1)—0:041Y —4:47 or X —0*041Y -1-55:88, 


. ,  Mlustration 23. The following table gives aptitude test scores and productivity 
indices of 8 randomly selected workers. Find the equation to the line which can be 
used to predict the productivity index from the aptitude score. Estimate the producti- 
vity index of a worker whore test score is 66. 
Aptitude score (X) 57 58 59 59 60 61 62 64 
Productivity index (Y) 67 68 65 68 72 72 69 7A 
(M. Com. Meerut, 1975) 


Solution. 
We have to fit a regression of Y on X. Fitting Regression of Y on X. 
24 (0—60) Y (Y—69) : 

x x » » xy 
57 -3 9 67 -2 4 6 
58 —2 4 68 -l 1 2 
‘59 —1 1 65 —4 16 4 
59 —1 1 68 -1 1 1 
60 0 0 72 T3 9 0 
61 +1 1 72 T3 9 3 
62 +2 4 69 0 0 
64 +4 16 71 +2 4 8 


ZX—480 Zxe0 Zx!1—36 IY-—552 Zy-0 = Zy—44 Хху=24 
c essc als KNEES 
Regression of Y on X is given by Y—Y=bya(X—X) | 


з: УХ 480 — 
Те =з =9:Х- 7-80 
ху 24 2 


7 Au 5 36v 15 
Ү—69= 2 (x—60) de X—40 
3 3 
2 
Ye-5-X429 


When X is 66, Y will be 
Y- x 664-29—73 


Ilus:ratione;24. A panel of two judges Р and Q graded seven dramatic perfor- 
mances by indepndently awarding mark: as follows : 


Performance f 2 3 4 5 6 7 
Marks by P 46 42 44 40 43 41 45 
Marks by О 40 38 36 35 39 37 41 
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The eighth formance, which judge Q'could not attend, was awarded 37 marks 
by judge PL If Jule О had also no* been present, how many marks would be expected 
to have been awarded by him to the eighth performance ? (B. Com., Delhi, 1975) 


Solution. The marks expected to be awarded by judge О can be found out by 
fitting a regression equation. Let marks by P be denoted by X and marks by Q by Y. 
We have to fit an equation of Y on X. 


Marksby Р (x-X) MarsbyQ — (y-Y) 
x x Y y 


xt у ху 
46 +3 9 40 +2 4 6 
42 -1 1 38 0 0 0 
44 +1 1 36 —2 4 -2 
40 -3 9 35 —3 9 9 
43 0 0 39 +1 1 2 
4 -2 4 37 -1 1 2 
45 42 4 4l 43 9 6 


2X=301 Ix-0  Ix—28 уу—266 Zy—0 zy*—28 Zxy—21 


Regression equation of Y on X is given by ; 


Y-Y- b x-X) ; | 
TU. p. ae =43 
xy 21 


o's Y~38=—°75(X~43)=°75 32:25 or Y="75X+5'75 
When X is 37 Y will be 
Ү='75(37)--5'75- 33°50 

Thus if Q would haye been present, he would have awarded 33:5 marks (or 34 
marks app.) to the eighth performance, 

Illustration 26. Calculate the regression *quation of X on Y and Y on X from 
the following data and estimate X when Y is 20, 

x 10 12 13 17 18 

Y 5 6 7 9 13 

Also determine the value of correlation coefficient, 
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Regression equation of X on Y : X-X =bzy q-Y, ) 


э дч LU Ш Са аА 
XM: P= с-з = 
Хху 40 
bzy= - - 
zy zy: ^ 40 1 
De X—14—Y-8 X=¥+6 


When Y is 20, X would be 
X=20+6 or 26. 


Regression equation of Y on X : 
y-Y bay (X-X) 
bye 22s -5 =0°87 
Y—8-87(Y—1218) or Y="87X—10'59+8 


Ye:87X—2:59 
rV bay X bya 
bay=1 (since Х=Ү+6) 
bya='87 (since Y="87X—2'59) 
is rev 1x:87—0:93, 
rtain Y and Y series which are correlated, the two lines 


Illustration :25 For ce 


of regression are : 
5X—6Y--90—0 


15X--8Y—130—0 


fYon Yand which is of Хоп Y, Find the means 


Find which is regression o 
(M. Com. Meerut, 1975) 


of two series and the correlation coefficient. 


Solution : 
Calculating mean values : 
5X—6Ye—90 wd) 
15X—8Y=130 (li) 


Multiplying equation (i) by 3 
15Y—18Y2—270 


15X—8Y=130 
Ep = 
—10Y=—400 or Y=40 


Putting the value of Y in equation (i) 
5Х—6х40=—90 or 5Х=—90+240 


5X=150 7. X= 
Hence 1—40 and X=30. 


Determining the value of correlation coefficient 


Let us take the first equation as regression of Y on X. 
5X—6Y490-0 or —6Y-—5X—90 


G 
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or Y-3 x499 


Буз PRA 
From equation (i) 


130 8 

0—13 (ee eL 

15Y—1304.8Y or X 15 +457 
8 

bs 


p—— 8 
re M bz, x bus 15 * g = 0:667. 


Since the value of r has not exceeded one, our assumption is correct. 
Equation (i) is the regression 


of Y on ¥ and equation (ii) the regression of Y 
Nlustration 26, Fit а regression line of consumption С on income, Y on the 
basis of the following data : 

C (Rs.) 90 95 100 105 106 
Y (Rs.) 100 110 120 130 140 

Comment on your results, (B.A. Hons. Econ. Delhi, 1975) 
. Solution : 


CALCULATION OF REGRESSION LINE OF CONSUMPTION ON 


МСО) 


оп У, 


c (C—100) Y 7120 

(Rs.) d, dè (Rs.) dy dy? dedy 

90 —10 100 100 —2 4 20 

95 25 10 -—1 1 5 
100 0 0 120 0 0 0 
105 +5 25 130 +1 1 5 
106 +6 36 140 +2 4 12 

2С=496 54„=—4 Idj—186 E¥=600 2dy=0 54—10: Ed,dy=42 
Regression line of consumption on income is given by 
Cur tt Y. D ZC 6 . 
C-C-r xen ses N =—5-=992 
y ZY 600 
t= 7$ cU 
Zdd,— 24 х Edy 
Caii m д х e 
су Zay— [^d Gi 
N 
42— C9 0 : к 
= 3 do 042 
(0? |* 10 ^ 100 
уча 


С—99°2=0`42 (Y—120)—42Y —594 or C—-42Y 348-8, 
UE Inde oos 
* C,— Common factor for variable C aud C, Common factor for variable Y, 
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Limitation of Regression Analysis 


In making estimates from a regression equation, it is important to 
remember that the assumption is being made that relationship has not 
changed since the regression equation was computed. Another point 
worth remembering is that the relationship shown by the scatter diagram 
may not be the same if the equation is extended beyond the values used 
in computing the equation. For example, there may be a close linear 
relationship between the yield of a crop and the amount of fertilizer ap- 
plied, with the yield increasing as the amount of fertilizer is increased. 1 
would not be logical, however, to extend this equation beyond the limite 
of the experiment for it is quite likely that if the amount of fertilizers were 
increased indefinitely, the yield would eventually decline as too much 
fertilizer was applied. 


SUGGESTED READINGS 


Chou : Statistical Anal ysis. 

Richmond : Statistical Analysis. 

Stockston and Clark : Introduction to Business and Economic Statistics. 
Taro Yamane : Statistics—An Introductory Analysis. 
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НЕЛИНА 


Аз pointed out earlier, Statistics deals with quantitative pheno- 
menon only. However, the quantitative character may arise in any of 
the following two Ways : 


l. Inthe first place, we may measure the actual magnitude or size 
of some phenomenon. For example; we may measure the height of 


statistics of variables. The various statistica] techniques like measures of 
central tendency, dispersion, correlation which have been discussed in the 
earlier pages deal with such variables, 


many are blínd and how many are not blind but we cannot precisely 
measure blindness. Such phenomena where direct quantitative measure- 
ment is not possible, i.e, where we can study only the presence or absence 
‘of a particular characteristic, are called Statistics of attributes. 


Difference hetween Correlation and Association 


The tool of correlation is used to measure the degree of relationship 
etween two such phenomena as are capable of direct quantitative 
measurement, On the other hand, the method of association of attributes 


While dealing with statistics of attributes we have to classify the 
data, The classification is done on the basis of presence or absence of a 
; ic. When we are studying only one 
attribute two classes are formed—one Possessing that attribute and another 
Not possessing it. For example, when we are studying the attribute 
employment, two classes shall be formed ; those who а 


those who are not employed. When two attributes are studied four classes 


Our classes shall be formed: number of males employed 


females employed, number of males unemployed and number 
unemployed. 
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It should be noted that in some cases while classifying the attributes 
no clear-cut definition of an attribute and line of demarcation between 
classes can be drawn. For example, when the attribute ‘employment’ is 
being studied the data are classified into ‘Employed’ and ‘Unemployed’, 
But there can bea further category of those people who are partially 
employed (i.e. part-time). Also there may be some persons who are 
employed before the survey but on the date of survey they are unemp- 
loyed. . So we cannot treat them as employed and also as unemployed 
because there is some difference between those persons who have not got 
any job, and those who had got some job but were retrenched after some 
time. Hence, it is absolutely essential to lay down clear-cut definition of 
the various attributes under study. This is often a difficult task. Hence, 
this limitation must be kept in mind while studying association between 
attributes. 


Notation and Terminology 


For the sake of simplicity and convenience it is imperative to use 
certain symbols to represent different classes and their frequencies,.. It is 
customary to use capital letters A and B to represent the presence of the 
attributes and the Greek letters ‘х’ (alpha) and ‘B’ (Beta) to represent 
absence of the attributes. Thus ‘a’=not А and 'f'—not B. For example, 
if А represents males then ‘æ’ would represent females. Similarly, if В 
represents literates then 6 would denote illiterates. The combination of 
the different attributes is denoted by (АВ), (АВ), (~B) and (x9). Thusin 
this example (АВ) would mean number of literate females and («f) 
illiterate females. The number of observations in different classes is called 
‘class frequency’. Thus if the number of literate males is 50, the frequ- 
ancy of class ABis 50. Class frequencies are denoted by enclosing class 
notation in brackets like (АВ), (28), etc. Thus, 


(A) denotes number of individuals possessing attribute A. 


(AB) denotes the number of individuals possessing attributes A and 


(aß) denotes number of individuals possessing attributes « and 6. 


Any letter or combination of letters like A, AB, «3, etc., -by means 
of which we specify the characters of the members of a class, may be 
termed a class symbol. 


Class-frequencies. The number of observations assigned to any ` 
class is termed for the sake of brevity the frequency of the class or the 
'class-frequency'. — Class-frequencies are denoted by enclosing the corres- 
ponding class symbols in brackets. Thus (B) denotes the number of Bs, 
i.e., objects possessing attribute B. (AB) the number of Af's ie., objects 
Possessing attribute 4 but not B, and so on for any number of attributes. 


Order of Classes and Class-frequencies. The order of a class 
depends upon the number of attributes specified. A class having one 
attribute is known as the class of the first order, a class having two 


attributes as class of the second order, and so on, The total number of 


i A 


E-12:5 ASSOCIATION OF ATTRIBUTES 


-observations denoted by the symbol N is called the frequency of the zero 
order since no attributes are specified. Thus we have : 


N frequency of the zero order 

(4) (B) } frequencies of the first order 
(a) (6) 

а e } frequencies of the second order 


Number of Frequencies. In а study of » attributes the total 
number of class-frequencies is given by 3". 


(i) For one attribute, the frequencies are 3!=3. 


(ii) For two attributes, total frequencies are 32 9, They are in the 
order 1+4+4=9, 


Any class frequency can always be expressed in terms of class. 
frequencies of higher order since the total number of Observations must be 
equal to the number of 4’s added to the number of «’s, i.e., 

AN —(A) -(«) 

Similarly, the number of 4's is equal tothe number of 4’s which 

are B’s added to the number of A's which are В ie, 

(4)=(4B)+ (48) 
Similarly, (a) = (B) + (aß). 
Ultimate Class-frequencies, Tt is clear from above that every 
class-fiequency can be expressed in terms of the frequencies of the highest 
order, i.e, of order n. Any frequency can be analysed into higher 
frequencies, and the process need stop only when we have reached the 
frequencies of the highest order. For example, with two attributes, 

"m n } ultimate class-frequencies. 


„The classes specified by љ attributes, i.e., those of the highest order, 
are termed the ultim ate class.frequencies, A given data can be completely 
specified if only the ultimate class-frequencies are given. 


* The total number of classes of ultimate order is determined by the 
formula 2% where n stands for the number of attributes studied. If two 
attributes are studied then the number of classes of ultimate order shall be 
2*-4. In case three attributes are studied then there would be 23-8 
classes of the ultimate order. 


"The frequencies of the positive, negative апа ultimate classes can be 
known from the following table which is known as the. nine square table 
(since nine squares are formed) ; 


4 = Total 


В| (AB ез) | (B) 
MIT (8) 


Total (4) @ | N 


а d — M ÀMÀM—— Д 
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From this table certain relationships can be described : 
(4)— (AB) + (AB) ; (x) 2 (xB)4- (o8) i 
(B)=(AB)+ («B) ; (8)—(48)-- (a8) 

N=(A)+(«) Or N=(B)+(8) 
Or 
N=(4B)+(48)+ («B)-4- (s) 


From these relationships if we know any of the ultimate class- 


requencies and any other 
r three i 
уаш аа T values, we can find out the frequencies of 


Illustration 1. From the following data find out the; missing frequencies : 
(AB)=100, (А)=300, N«1,000, (В)=600. 


Solution, Putting these values in the nine-square table : 


А a Total 


8 
200 


Total (A) 
300 


The missing frequencies are (48), (2B), (a8), (а) and (8). 
(48)—(4)—(4B)— 300— 100—200 

N («)=N—(A)=1000—300=700 

1 (B) - N—(B)—1000— 600—4C0 y 
(«B)=(B)—(AB)=600—100=500 
(2B) = (8) —(48)=400—200=200 

Thus the missing frequencies are : 

(AB)=200, (=) = 500, (aB}=200, (8)=400, (2)—700. 


Consistency of Data 
In order to find out whether the given data are consistent or not we 


haveto apply a very simple test. The test is to find out whether any one 
or more of the ultimate class frequencies is negative ог пої. Ifnone of 
the class frequencies is negative we can safely conclude that the given data 
' are consistent (i.e., the frequencies do not conflict in any way with each 
other). On the other band, if any of the ultimate class-frequencies comes 
out to be negative the given data are inconsistent. Thus the necessary 
and sufficient condition for the consistency of a set of independen t class- 
^ frequencies is that no ultimate class-frequency is negative. 
Illustration 2. From the following two cases find out whether the data a 
+ A consistent or not : 
Case I (A)=100, (B) =150, (АВ)=60, N=500 
Case II (A)=100, (B)=150, (4B) —140, N=500 


E-12.5 ASSOCIATION OF ATTRIBUTES 


Solution. Case I: 

We are given (4)—100, (B)=150, (4B)—60, N=500. 

Substituting these values in the nine-square table: 
А a Total 


310 350 
(4) (а) N 
Total | A 


Е the tabl 
E E (484) (4B)=100—60=40 


(«B)=(B)—(AB)=150—60=90 
(8) =(«)—(«B)=400—90=310 
Since all the ultimate class frequencies are positive we conclude that the given 
data are consistent. 
Case П. Given values are : (A)=100, (B) =150, (AB)=140, (N)=5CO 


Ву putting these values in the nine-square table, we can determine the missing 
values : 


A a Total 
B| 
(AB) (aB) (B) 
140 10 150 
(AB) | (a8) (8) 
—40 390 350 
Total 
(4) (а) N 
100 400 500 


From the table (48)—(4)—(AB)—100—140— —40 
(@8)=(B)—(AB)= 150—140=10 
(aß) —(2) — («B) = 400— 10-390. 
Thus one of the ultimate class-frequencies, i.e., (А B) is negative and hence the 


given data are inconsistent. 
Asseciation and Disassociation 


The word association as used in Statistics has a technical meaning 
different from the one in ordinary speech. In common language one 
speaks of A and B as being ‘associated’ if they appear together in a num- 
ber ot cases. But in Statistics, A and B are associated only if they appear 
together ina greater number of cases than is to. be expected if they are 
independent. On the other hand, if this number (or proportion) is less 
tham expected for independence they are disassociated. Thus, in case of 
second order frequencies, А and B are : 


(i) associated if (АА) (x8): (48) (aB) ; and 
(it) disassociated if (АВ) (23) < (AB)(«.B). 
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à Hince, it should carefully be noted that association cannot be 
inferred from the mere fact that some A's аге B's, however great the 
proportion. 
METHODS OF STUDYING ASSOCIATION 
In order to ascertain whether two attributes are associated or not 
the following methods may be used : 
I. Comparison of Observed and Expected Frequencies Method. 
II. Proportion Method. 
III. Yule's Coefficient of Association. 
IV. Coefficient of Collignation. 
V. Coefficient of Contingency. 
I. Comparison of Observed and Expected Frequencies 
Method. 


When this method is ‘applied the actual observation is compared 
with the expectation. If the actual observation is equal to the expectation* 


the attributes are said to be independent ; if actual observation is more 
than the expectation, the attributes are said to be positively associated and 
if the actual observation is less than the expectation, the attributes are 
said to be negatively associated. 


Symbolically, attributes 4 and B are : 

(i) independent if (AB) – ха (expectation) ; 
(actual observation) 

(ii) positively associated if (45 14248). (expectation) ; and 
(actual observation) 

(iii) negatively associated if кав) «(4X0 (expectation) у 


(actual observation) 


The same is true for attributes а and В; «апа B. and A, В. Thus 
attributes œ and f shall be called : 


* Expectation is the product of the probability (i.e., chance of happening of an 
суеп!) and the number of observations. Рог example, when a coin is tossed the proba- 
bility of a heed or tail coming up is equal, j.e., 50% ог 4. If the coin is tossed 100 
times the expectation of a head coming up is x 100—50 and of a tail coming up is 
1x100—50. Iftwo coins are tossed chance of two heads or two tails coming up is 
reduced to 3x 4—1/4. When two attributes A and Bare studied in a universe N and 
the class frequencies of these attributes are (A) and (B), then 

Probability of (4)e V; Probability of (8) 


Probability of (A) and (B) combined (D x o 


Expectation of (4) and (В) combined= 4). BL Ax) 
SME—10°77-25 


ON 
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(i) independent, if ев) = OXW 4 
(ii) positively associated, if (a8)> xp. хапа 


(iit) negatively associated, if pn meme 


Illustration 3. From the following data findout whether attributes (i) AB, 
(ii) AB, (iii) «B, and (iv) «В are independent, associated or disassociated. N= 100, (A)=40, 
(B)=80, (AB)=30, 


Solution. (i) Apply the criterion of independence, ie., attributes (AB) shall be 
called independent if (4B)= coxa». ў 
positively associated if (4B) XD . запа 
negatively associated or dissociated if qn) Ax) 

Expectation of (48)= OXA неге (4)=40, (B)=80, N= 100 


Expectation of (AB)= = у =32. 
The actual observation [Le., the given value of (AB), i.e., 30) is less than the expecta - 
tion and hence the attributes are disassociated or negatively associated. 
ii) For finding out the nature of association between the attributes 48, «B and 


aß wë shall have to determine the unknown values. This can be done by preparing 
а nine-square table. 


A Li Total 


From the table (48)=10, (&B) 50, («8)—10, (2) 60, (B) 20. 
Attributes A and В shall be independent if (48)— Axe) 


Bapectation of, (apg СОХО) мел до, 0920, Nm 100 


^. Espécíation of (49-20 
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Thus the actual observation [i.e., (48)=10, is more than expectation and hence {һе 
attributes А and 8 are positively associated. 


(iii) Attributes а and B shall be called independent if (@B)= ex 


Expectation of («B)= KUA where («)=60, (В)=80, N=100 


Expectation of («B)= ох —48. 


The actual observation [i.e , (аВ)=» 50] 15 more than the expectation and) hence the 
attributes are positively associated. 


(iv) Attributes а and B shall be called independent if (a8) соры 


Expectation of ,a8)— ee, where (а) ==60, (8) 20, N «100 


Expectation of («ß)-= oe 12. 


The actual observation [(a8)- 10] is less than the expectation (12), Hence the attributes 
are disassociated, 

Limitations. With the help of this method we can only determine 
the nature of association (i.e., whether there is positive or negative asso- 
ciation or no association) and not the degree of association (i.e., whether 
association is high or low). Yule’s coefficient is superior because it 
provides information not only on the nature but also on the degree of 
association. 


II. Proportion Method 


If there is no relationship of any kind between two attributes A and 
В we expect to find the same proportion of A’s amongst the A's as 
amongst the B's, ‘Thus if a coin is tossed we expect the same proportion 
us Vid irrespective of whether the coin is tossed by the right hand or the 
eft hand, 


Symbolically, two attributes may be termed : 


pM i (AB) (48) 
(i) independent if AB =p 
(4B). (46) 


(tt) positively associated if ^ 


m Р > ‚ (AB) | (Ab) 
(iii) negatively associated if (B) < (B) 


If the relation (i) holds good the corresponding relations 
(B) (В) | (AB) («B). (45) (af) 
(В) (8) ' (B) да) ° (4) (ә) 
, must also hold true. 


Illustration 4. Ina population of 500 students the number of married is 200. 
Out of 150 students who failed, 60 belonged to the married group. It is required to 
find out whether the attributes, marriage and failure, are independent, positively 
assciated or negatively associated. 
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Solution. Let 4 denote married students, 
a represents unmarried students. 
Let B denote number of failures. 
B would denote non-failures. 
We are given the total number of students, i.e., N=500 
(A)=200, (B)=150 and (AB), i.e., the number of married students who failed=60. 
Applying the proportion method 
¢ AB) _ («B) 
(A) (2) 
In other words, if the proportion of married students who failed is the same as 


the proportion of unmarried students who failed we say that the attributes, marriage 
and failure, are independent. 


Proportion of married students who failed, 


ie, CF = 309 =0°3 or 30% 


Attributes A and В shall be called independent i| 


Proportion of unmarried students who failed, 
CB) 90 . (&B)- (B) — (AB), i.e., 150-60 = ) 
м» (а) — 3g — 3 or 20% (2))2N—AÀ, ie., 500—200=300 


Since the two proportions are the same we conclude that the attributes, 
marriage and failure, are independent. 


Limitation of the Method, Just like the previous method, under 
this method also we can only determine the nature of association and not 
the degree of association. 


ПІ. Yule’s Coefficient of Association 


The most popular method of studying association is the Yule’s Co- 
efficient because here not only we can determine the nature of association, 
$.е., whether the attributes are Positively associated, negatively associated 
or independent but also the degree or extent to which the two attributes 
are associated. The Yule’s Coefficient is denoted by the symbol Q and is 
obtained by applying the following formula : 


ga (AB)(a8) — (A8)(«B) 

A (4B)(«)-- (4B) («B) 
The value of this coefficient lies between --1. When the value of Q 
is +1 there is perfect positive association between the attributes, Where 
Q is —1 there is perfect negative association (or perfect disassociation*) 


between the attributes and when the value of. Q is zero the two attributes 
are independent. 


„Тһе coefficient of association can be used to compare the intensity of 
association between twó attributes with the intensity of association between 
two other attributes. 


Hlustration 5. Investigate the association bet 

eye colour of wives fron the data given below : E ишаа 
Husbands with light eyes and wives with light eyes =309 
Husbands with light eyes and wives with not-light eyes =214 


be word ‘Gisassociation’ we do not mean absence of association ; rather it 
Sseace Of DCgative association 
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Husbands with not-light eyes and wives with light eyes 2-132 
Husbands with not-light cyes and wives with not-light eyes 119 
(M. Com., Gorakhpur, 1972) 


Solution. Since we have to find out the association between eye colour of hus- 
band and that of wife, one attribute we would take as 4 and another as B. 


Let А denote husbands with light eyes. 
& would denote husbands with not-light eyes. 


Let B denote wives with light eyes. 
7. B would denote wives with not-light eyes. 


The given data in terms of these symbols is 
(AB)=309, (48)—214, (@B)=132, (aB)=119. 


Applying Yule’s method о АВ аа) 


Substituting the above values іп theformula 
., (309 x 119)—(214 x 132) 8523 2013 
Q- 99x19) Q4x132) 65019 =0 131: 


wif Thus there is some association between the eye colour of husband and that of 
e. 


. Wilustration 6. Eighty-eight residents of an Indian city, who were interviewed 
during à sample survey, are classified below according to their smoking and tea drink- 
ing habits. Calculate Yule's Coefficient of Association and comment on its value. 


Smokers Non-smokers 
Tea Drinkers 40 33 
Non-tea Drinkers 3 12 
» р (М. Сот., Delhi, 1969) 


Solution, Let А denote smokers. 
+, а would denote non-smokers. 


Let В denote tea drinkers. 
.'. В would denote non-tea drinkers. 
The given data in terms of these symibols are : 


(AB), i.e., number of Smokers and tea drinkers =40 
(AB), i.e., number of smokers and non-tea drinkers =з 
(aB), i.e., number of tea drinkers and non-smokers =33 

=12 


(« B), i.e., number of non-smokers and non-tea drinkers 


—(A8) 
Applying Yule’s method: Q= арр ENG) 


Substituting the values of (АВ), (48), (#B) and (aß) in this formula 
Q (40x12)-(8X33) _ 480—99_ 381 0-658 
7 (40x12)3(3x33) “ 480-99" 579 
This shows that the attributes tea drinking and smoking are positively 
associated. 
Illustration 7. Given №=1,500, (4)=383, (B)=360, (AB)=35. 


Prepare 2x2 contingency table. Compute Yule's Coefficient of Association, 
and interpret the result. "T (M. Com., Allahabad 1973) 


E-12.11 ASSOCIATION OF ATTRIBUTES 


Solution. 2x2 contingency table 


доза 383 [i7 1500 


Yule's Coefficient of Association 
_ (АВ)(а8)—(А8)(аВ) _ (35x792)—(348 x 325) 
= GB) GB) + (ABEB) (35% 792) 4 (348 х 325) 
27720- 113100 —85380 
on а еМ. N L 
277204 113100— 140820 " — 606 
This shows that А and B are disassociated. 
lilustration 8. The following data relate to literacy and unemployment. in a 
group of 500 persons. You are required to calculate Yule's Coefficient of Association 
between literacy and unemployment and interpret it. 


Illiterate Unemployed 220 
Literate Employed 20 
Illiterate Employed 180 


Solution. Let А denote literacy and B unemployment. 
Hence а will denote illiteracy and 8 employment. 
We are given (#B)=220, (48)—20, (48)—180, N=500. By substituting these 
values in the nine-square table, we can find out the value of (AB). 
A. а Total 


B| so [220 |300 | 


Total | 100 | 400 | 500 


Thus (AB)=80 
Q- (ABY(xB) — (48)(aB) 
(AB)(#8) + (АВ)(«В) 
ier Е _ (80x 180)—(20х 220) 
Substituting the values = (80x 180) (20x 220) 
_ 14400—4400_ 10000 д 
4100-71400 18800 7 10532 
There is positive association betwecn literacy and unemployment. 


Criteria of Independence 
The following is the list of criteria that may be used in order to 
ascertain whether attributes А and B are independent. A and B would 
be independent if : 
(AB) _ (АВ) (aB) _(оВ) 
Gy xa OF 
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(AB) _(«В) . (AB) _ (a8) 
dd (4 (y 17 а (4). 0 
(АВ) _ (A) (AB) _ (В) 
Ri (E 4€) 9а) |18. 
or (4B) - 009) or Un Ssi x® 
4. (АВ) _ (48) 
ог (4В)(«В) = (48)(«B) or TaB) = (98) 
* (AB). («B) 
(48) (0) 


IV. Coefficient of Collignation 


. Yule has computed another coefficient called the coefficient of 
‘collignation’. It is denoted by the symbol ү and is obtained by. apply- 
‘ing the following formula : 


1— ‚[ (48) х(«В) 
Y2——— (AB) x (a) 

(AB) X(xB) 

+ | LA 


T From this coefficient we can obtain Yule's Coefficient of Association, 
i.e., О as follows : 


з де 
MEZ 


Tt should be noted that though y and Q serve the same purpose these 
coefficients are not directly comparable with each other. Further, in 
practice Q is more popularly used than y as a measure of association. 


У, Coefficient of Contingency 


So far we have considered cases of dichotomous classification. ` How- 
ever, qualitative data are often classified into more than two classes, i.c. 
attribute A may be classified not as ‘A’ and ‘not A’, but as Ау, Ay, Aa, 
etc. Similarly, another attribute B may be subdivided into B, B, Bs, 
etc. The frequencies falling within the different classes may be arranged 
„in the form of a Contingency Table* as follows : 
* A contingency table isa frequency table in which a sample from the popula- 
tion is classified according to two or more attributes. 
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Bn Total 
(Ay Bn) (Ai) 


ue И | (An) 


Total wi (By) 


For determining the degree of association between A's and В?з on the 
whole the coefficient of mean square contingency as given by Pearson may 
be used. The coefficient of mean square contingency is denoted by the 
symbol 0 and obtained by applying the following formula : 

x: 
(ару Бод 
N+% 
While finding out the Value of C we proceed on the assumption of null 
hypothesis, i.e., the two attributes are independent and exhibit no associa- 
tion, 

For calculating С we have to determine the value of x? (pronounced 
as chi-square)*. The steps in calculating the value of x? are : 

(i) Find the expected or independent frequency for each cell. Thus, 


for cell (4,B,) the expectation is dax (By) 
(i$) Obtain the difference between the observed (actual) and expected 
frequencies in each cell, i.e., find (0—E). 


(iii) Square (0— Е) and divide this figure by F, the expected fre- 
quency for each cell. 


(iv) Add up the figures obtained in step (iii), This would. give 
| de value of %8. "Thus gae z (0 EP. 


рур 


z Once the value of X? is obtained it is easy to determine the value 


Illustration 9. The following table shows the association among 1 imi 
les their weight and mentality. Calculate the coefficient. of al et 
е two. 


AL ардах 


LLL 
" EA a detailed discussion of x2, please refer to Chapter on ‘x® test and good- 
ness of fit’. 
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Mentality ] Weight in Pounds 
110—120 120— 130 130—140 140 --150 150 Total 
upwards 
Normal 50 102 198 210 240 800 
Weak 30 38 72 30 30 200 
Total 80 140 270 240 270 1,000 
NET . (M. Com., Agra, 1972) 
5 Б E PER nS (O—Ey 
Solution. Coefficient of contingency or C= N a ai => 
Weight in Pounds—B 
к АЕ ОЛЕ ЛЕ а 
110 - 120 | 120—130 ea 140 ien | 150 uer 
By By | muss 
ult Bs 
Normal 50 102 JF 240 800 
du у ———- 
=f 
Weak 30 38 30 200 
Ag 
Total 80 140 270 | 240 270 1,000 


The expected frequency corresponding to us cell (4181) is 
(A)x (B1) — 800 „ру 
N 1000 «90764. 
The expected frequency MP dE to the cell (4183) is 
(Ax) x (В) _ ы 
NO = х 140=112. 
The expected frequency corresponding to the cell (4183) is 


KA) х (Ba) „800 „у. 
N oo * 270-216. 


The expected frequency ать to the cell (4184) is 
иро -0 x 240—192. 


The expected TRA mad to the cell (41B;) is 
A) x (Bs) _ 800 е 
N 1000 x 270—216. 


Thus the table of expected frequencies is : 


Weight in Pound—B 
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[^] E (O—E)» (O—E»|E 
(А,В) 50 64 196 37062 
(4182 102 112 100 0:893 
А\Вз) 198 216 324 1:500 
DAT» 210 192 324 1687 
(41В5) 240 216 576 2667 
(4381) 30 16 196 12250 
(42Вз) 38 28 100 3:571 
Az 72 54 324 6'000 
(4234) 30 48 324 6750 
(42B5) 30 54 576 10667 
Sy MiG TO qe e pad T 6.88. 


MISCELLANEOUS ILLUSTRATIONS 


Illustration 10. (a) From the following ultimate class frequencies, find the | 
frequencies of the positive and negative classes and the total number of observations : 


(4B)—100, («8)— 80, (48)—50, (a8) —40, (B. Com., Mysore, 1973) 
Solution. Substituting the given values in the nine-square table : 


A a Total 


Frequencies of Positive Classes 
(4)=(AB)+(48)=100+50=150 
(B)—(AB)--(«B)— 1004-80— 180 

Frequencies of Negative Classes 
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Solution Let A denote boys. 


& will denote girls. 
Lct B denote those who passed the examination. 


В will denote those who failed. 


We are given : 
N=200, (4) = 150,(AB) = 120, («8)-—10 


Other frequencies can be obtained from the ninc-square table : 


A « Total 


i ‚ о AAB) (aB) -(A8) (#8) 
Applying Yule's Method: Q= ДАВ) (38)4 (48) («ВУ (aB) (48) (Е) 
(AB) «120, (a8)— 10, (48)=30, («8)—40 


Substituting the values, 
Q.— (120 10) G0 x 40) 1200—1200 | 5 
—(20x10):(30x40) — 1200-1200 
Hence there is no association between sex amd success іп the examination. 
Illustration 11. In an assortative mating study to find whether tall husbands tent 
to marry tall wives, the following information about the wives of 125 tall and 125 short 


statured husbands was published : 
Tall husbands Short husbands 
(per cent) (per cent) 
Tall wives 56 13 
il 48 


Short wives 
Find the coefficient of association between the stature of wives and husbands 
ignoring medium-sized wives. 
Solution, Let A denote tall husbands. 
а will бево" "hort husbands. 


Let В deno’ wives. 

Ё will denote short wives. 
Thus (AB)=56, (48)—11, («В)=13, («8)=48 

(ЪЗ ‚ AB —(48)(В 
Applying Yule's Method : О= ey 4 rien = EB) 

(56x 48)—(11 x 13)_ 2688—143 — 2545 0:899 

= 56x48) +1 x13) 2688-143 2831 

late Yule’s Coefficient of Association between marriag 


Illustrati А 
rau er Сн data pertaining to 500 students : 


and failure of. students from the following 
Passed Failed Total 
Married 90 65 155 
marri 260 110 370 
ponenu (M. Com., Gorakhpur, 1974 


Solution. Let A denote married persons 
.. а will denote unmarried persons. 
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Let B denote those who failed 
*. В will denote those who passed. 


Thus (AB) —90, («8)=260, (4B)=65, («B)=110 
ў > . о— CABY(8B) — (ABB) 
Applying Yule's Method : О: (ABYa8)- Е (ABB) 


(65)260)-Q0kl10) 169009900 7000,1 
= (65)260)--(90)(110) = 16900439900 = 26800 ^ 


Illustration 13. Comment on the following statements : 


) “99% of the people who drink beer die before reaching 100 years of age. 
Therefore, drinking beer is bad for longevity.” 


(b) Road accidents resulted in 5,082 deaths in 1962 in India and 8,623 in 
1972 while the number of women drivers increased in the period. Hence women make 
bad drivers. (C.A. 1972) 


Solution. (a) „We are not given complete information, i.e., what percentage of 
people who do not drink beer die before reaching 100 years of age and as such the 
inference drawn above is wrong It is possible that 100% of the persons who do not 
drink beer may die before reaching 100 years of age in which case drink may be found 
to be good for longevity. Therefore, for association between А and B, in addition to 
qn ‚ it is necessary to know AD also. 


(b) On the basis of the information given itcannot be conclud.d that the 
women make bad drivers, Women can be regarded as bad drivers only if number 
of accidents made per woman driver is more than the number of accidents made by 
а man driver. These figures аге" not given. The increase inthe number of acci- 
dents from 5,082 in 1962 to 8,623 in 1972 may be notdue to bad driving but due to 
other factors like increase in population, increase in number of vehicles on the 
road, p congested roads, etc. Hence the statement is an example of illusory 
association, 


Association of three Attributes* 


va In case of three attributes, the frequencies are grouped as given 
elow : 


Order 0 N m 
Order 1 (4) (B) 
(2) (8) (9 poe 
Order 2 AB) (40) ВС 
к TUE s 
e zi 
(88) (ay) g9 
Order 3 (ay) (9 
aBy) 
ABO) =8 
(ABy) er 
UN 


*For Post-graduate students only. 
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ў Thus in case of three attributes there аге 27 distinct frequencies as 
given on previous page : 1 of order zero, 6 of the first order, 12 of the 
second order and 8 of the third. 

Any class frequency can always be expressed in terms of class fre- 
quencies of higher order. For the whole number of observations must 
clearly be equal to the number of A's added to the number of o's, #.e., 

N=(A)+(a) 
Similarly the number of A’s is equal to the number of A’s which are B’s 
added to the number of A’s which are B's, i.e., 
(A)=(4B)+(48) 
Similarly (AB)=(ABC)+(ABy), and so on. 
It follows from the result that any class frequency can be expressed 


in terms of the frequencies of the highest order, t.e., of order п. For any 

frequency can be analysed into higher order frequencies, and the process 
need stop only when we have reached the frequencies of the highest order. 

For example, with three attributes, 

а (AB)d-(48) -(4 BO) (ABy) + (480) + (464) 
The classes specified by n attributes, ie., those of the highest order are 
termed the ultimate class frequencies. 
Every class frequency can be expressed as the sum of certain of the 
ultimate class frequencies. 
(B) - (ABO) --(ABr)-- (480) («B0 E 
(C) - (ABC)-- (BO) + (ABC)+ (#80) 

ы case of three attributes the following relationships need be remem- 
ered : 

(AB)=(ABC)+(ABy) | (BO) (ABO) («BO) (40)-(ABO)- (400) ` 
(48)— (A80)-- (4) (Bx) - (ABY) +(«By) (Ay) =(4By) + (4B) 
(«B)=(«BC)+(«By) (80) -(480)-- (aß0) («0)=(«BO) - (480) 

(a8) —- («8C) - («&y) (By) =(48y) +281) (av) =(@By)+(aBy) 
(A)=(4B)+ (4B) (B)- (BO) + (By) (0)=(40)+ (20) 
or (A)=(AC)+(4y) or (B) =(AB)+(@B) or (0) - (BO)-- (B0) 
N=(ABC)+(ABy) + (ABC) + (Ay) + («BO) + (aby) + (BO-t (887) 
The following chart will clearly reveal the inter-relationship between 
frequencies of the different orders : 


1 
d Ф 
mH. раа Ба РН sk oe 
4 d» » & P 


(48) 
ba ai ашаб. 
«до 


OU LI EE 
«ab, ab alo «lo «do «b oe 
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Similarly, 
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1 ) 
eo (BY) (BC) (v 


Br ud f Мы Кос f \ 
(4BC) («BC) (4By) («Вү) (ABC) ВС) (ABy) (ey) 
And 


N 
| 


y) 


i 


40 (ac) do (v) 


eps Ld ! 


ИЕ Д) f а Г 1 f 
boo (48C) («BC) (@BC) (ABY) (ABY) (@By) (aBy) 
Illustration 14. Given the following frequencies of the Positive class, find the 
frequencies of the rest of the classes : 
(4)=977 ; (AB)=453 ; (4BC) 127 : (B)=1,185 5 (4€) :284 ; N= 1 2/000 ; 
(С)=596.; (BC)=250. 


Solution, In case of three attributes we have 27 distinct frequencies of which 
only 8 are given—we have to find the remaining 19 : 


Second Order Frequencies 
The following second order frequencies are to be ascertained k 
(AB), (2B), (ap), (BY), (BC), (BY), (Ay), (20), (ay). 
2 (48) (4)—(4B)—977 —453—504 
(@B)=(B)—(AB)=1185—453=732 
(28) —(«)—(«B)&11023—732— 10,291 
(BY) -(B)—(BC) =1185—250.=935 
(С) = (С) (ВС) = 596—250... 346 
(бү) =(8)— (BC) —10815—346-- 10,460 
(4)—(4)—(4C)—977 —284—693 
(aC) =(С) – (4C)—596—284—312 
Third Order Frequencies : 
hé third order frequenci obtained 
(m. (BC) (р. requencies to be obtained. are (AB), («BC), (@By), (ABC), 
(AB) —(AB)—(ABC)—453—127—326 
(«5C)—(BC)— (ABC) —250—127—123 
(£1) — (4B) —(4BC)—732 123—609 
(48C)—(4C)— (4BC)-—284—127—157 
(АВТ) = (АВ) -(48C)— 524 — 157—367. 
(@8C)= (8C) (48C)— 3461572189 
(281)= (58) (8C) = 10291189 10102 


М d ks etd ftat there are more than one way in which any опе class fre- 
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t» Illustration 15. Given the following data, find frequencies of (i) the remaining 
positive classes, and (ii) ultimate classes. 


N=1,800, (4)=850, (B)=780, (C) - 326. 

.ABY)—200, (ABC)=94, («BC)=72, (ABC)=50 
29 Solution. We are to find out (i) (4B), (АС) and (BC), (ii) (48), (2BY), (a8C), 
(AB)=(ABC)+(ABy)=50+200=250 
(ВС) =(ABC)+(@BC)=504+72=122 
(AC) - (4BC)-- (d8C)—504-94—144 
(АВ) = (48) (ABC) ог (4)—(4B)—(48C) 850—250 —94—506 
(B1) (&B) —(«ВС)=(В) — (AB) — (&BC)— 780—250 —72—458 
(48C)—(8C)—(48C)—(C)— (BC)--(48C)—326—122—23—110 
(аву) =№— (4) – (B) —(C)--(4B)3- (BC) + (AC) -(ABC) 

—1100—850 —780—326--250 -122:4-104—50—310. 


Illustration 16. 100 children took three examinations. 40 passed the first, 39 
passed the second and 48 passed the third.” 19 passed. all three, 9 passed first two and 
failed in third, 19 failed in the first two and passed the third. Find how many children 
passed at least two examinations. Show thatfor the question asked certain of the 
given frequencies are not necessary. (B.Sc., Kanpur, 1972) 


Solution. We are given 
(A)=40, (B)— —49, (C)=48, (ABC)=10, (ABy)=9, (18C) —19, N=100. 
We have to find the number of children who passed at least two examinations, 
i.e., (ABC)+-(ABy)=(AyC)+(2BC). Of these we are given (ABC) and (ABY): 
We have to find (ABC) and («BC), 
(C) = (A)=(a@C)=(ABC) + (ABC) - (&BC)-- (eB) 
48=(ABC)+(ABC)+(@BC)-+19 
(ABC) + (ABC) + («BC)--(ABY)—48- 9=38. - 
It should be noted that we require only (С), (8С) ; the other frequencies are 
not necessary. 
Illustration 17. In a very hotly fought battle 
7095 at least of the combatants lost an eye. 
75% at least an ear. 
90% at least a leg. 
859, at least an arm. 
What percentage at least lost all the four organs ? 
Solution : i 


Let A stand for combatants who lost an eye, B for those who lost an ear, C for 
those who lost a leg and D for those who lost an arm. 


N stands for the total number of combatants. 

Then (4):-0:70N, (B)— 4-0:75N, (C)=0'80N, (р) = 0'85N. 

We have to find the least value of (ABCD) 

(ABCD)=(A)+(B)-+(C)-+(D) – 3N=0°70N + 0'75N-+0'80N -- 0:85N —3N —0:10N. 

The least value of (ABCD)=0'10N or 10%. 

Hence 10% at least of the combatants lost all the four organs. 

Illustration 18. The following are the proportions per 5,000 of workers observ- 
ed for certain classes of defects among a number of factory workers : 

A=Development defects ; B—Nerve signs ; C— Mental dullness. 

N=5,000 (С)=400 (4)-440 — (A4B)-170 

(B)2545 | (BC)-228 
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Show that some dull workers do not exhibit development defects and state how 
many at least do not do so. ' 

Solution. It is required to find the least value of («С). 

By the condition of consistencies 

(BC)—(AB)--(AC) у (C) ; (AC) ъ — (BC)--(AB)--(C) 
Now —(BC)--(AB)4-(C)— —228 4-1704-400 —342 
(AC) у 342 : (C) — (&C) ъ 342 ог (С) p 400—342—58 
At least 50 dull workers do not exhibit development defects. 

Illustration 19. At a competitive examination at which 600 graduates appeared, 
boys outnumbered girls by 96. Those qualifying for interview exceeded in number 
those failing to qualify by 310. Тһе number of Science graduate boys interviewed was 

while among the Arts graduate girls there were 25 who failed to qualify for inter- 


view. Altogether there were only 135 Arts graduates and 33 among them failed to 
' qualify, Boys who failed to qualify numbered 18. 


, Find (a) the number of boys who qualified for interview, (b) the total number 
pi Science graduate boys appearing, and (c) the number of Science graduate girls who 
alified, 


3 Solution. Let us denote boys by A, qualifying for interview by B and offering 
Science by C, the data, then are : 


N=600 (@By) =25 
(8)-(8)—310 rg 
(4BC)—300 (48) 18 


We are to find the values of (i) (AB), (ii) (AC), and (iir) («BC) 
N=(A)+ (2)—600 ; (A)—(«)=96 
(2A)=696 or (A)=348 
(B)+(8)=600 ; (B)—(8)—310 
(281—910 or (В)=455 
(С)= N— (ү) —600— 135—465 
(AB)— (A) —(48)—348—18—330 
Hence the number of boys who qualified for interview was 330. 
(4C)-(ABC)--(48C)—(ABC)-- (A8) — (A8) 
—(ABC)--CAB)— (ү) — (481)]—300--18—33-4-25—310 
Hence the total number of Science graduate boys appearing was 310. 
(@BC)=(BC)—(ABC)=(C)—(8C)—(ABC) yis 
=(С)- [B—(B1)]—(4BC)—465— 155--33—300—53. 
Hence the number of Science graduate girls who qualified was 53. 
: Consistency of Data. For three attributes the following condi- 
tions should be satisfied otherwise the frequencies given on the right will 
be negative : 


(a) (ABC) <O otherwise (4 BC) will be negative 
(0) (ABC) <(AB)+(40)—(4) |, (49y) |, » 
(с) (ABO) «(4B)-(BO)-(B)  , (аву), » 


(@ (ABC) «(4C)--(BO) -(C) ©, © (agQ) n » 
(е) (АВС) * (AB) ude EB) 
(f) (ABC) + (40) АРС 
(9) (ABC) » (BC) „= (aBC) 
(h) (ABC) > (4B)--(AC)-- (BO) 


—7(0—(8)-(0H-N- (ayy n " 


д 
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n : 3 
Е... Ѕіпсе m these ОА л independent, we get the follow- 
ing four conditions : (i) is obtained by combining (а) and (h j, 
оис g (а) (h), (j) by (b) 
(Ð (АВ)+(АС)-+-(ВС)< (А)-+(В)-Е(С)— (10) 
(j) (АВ)-„(АС)—(ВС)>(4) 
(k) (AB)—(AC)+(BC) (В) 
(D) (AC)+(BC)—(AB) (C) 
Or 
(AB)+(AC)—(A) AC) 
(AB)--(BC)—(B):» (4C) 
(AC)+(BC)—(C) (АВ) 
Illustration 20. Do you find any inconsistency in the data given below: 
N=100, (48) —483, (43) 378, (By) —226 
(4)—525, (B)=312, (1) 470 and (ABC)=25 
(M. Com., Allahabad 1971) 
E. Deeper 0s pana. or асе atiributes. The ultimate class frequencies 
(ABC), (ABY), (A8C), (ABy), (ВС), (Ву), (@BC), (a£) 
Before proceeding to evaluate them we find the values of 
(AB) —(4)—(48) —525—483—42 
(AC) =(A)—(Ay) =525—378= 147 
(BC) =(B)—(By) —312—226—86 
Now the ultimate frequencies are : 
(ABC)=25 (given) 
(ABy) = (АВ) —(ABC)=42—25=17 
(ABC)=(AC)—(ABC) =147—25=122 
(ABy)=(AB)—(ABC) =86—25=61 
(aBC)=(BC)—(ABC)=86—25=61 
(aBy)= («B)—(«BC)=(B) —(AB)—(«BC)=312—42—61=209 
(#8C)=(C)—(AC)— (BC) + (ABC) —470—147—86--25—262 
(383) N— (4) — (B)—(C)- (AB) - (C) - (BC) — (ABC) 
=1000—525—312—470+-42+ 147+ 86—25=—57 
As one of the ultimate class frequencies is negative, the given data are incon. 


sistent, 
3 flustration 21. The following summary appears on a survey covering 1,000 
fields. Find out if the data are consistent. 


Manured fields 510 
Irrigated » 490 
Fields growing improved varieties 427 
Fields both irrigated and manured 189 
Fields both manured and growing improved varieties 140 
Fields both irrigated and growing improved varieties 85 


(М. Com. Raj. 1972) 


Solution. Let ‘4’ denote manured fields, *B' irrigated fields and ‘C’ fields grow- 
ing improved varieties, Hence the given data аге: 
(4) 510, (В)=490, (C) —427, N=1,000, 
(4B)=189, (4C)—140, and (BC)=85 


SME- 9777-26 
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According to the condition of consistency 


(AB)+(AC)+(BC) < (A)+(B)+(C)—N 
(AB)+(AC) = (BC) —189--140--85—414 
(A)+(B)+(C)—N=510+490+427— 1,000 — 427 
(AB)+(AC)+(BC) is less than 427 and hence the result must be a misprint or 
mistake of some sort, in the given figures. 


Illustration 22. Of 1,000 people consulted, 811 liked chocolates, 752 liked toffees 
and 418 liked sweets, 573 liked chocolates and toffees ; 356 liked chocolates and sweets 
and 348 liked toffees and sweets ; 297 liked all three, Show that this information as it 
stands must be incorrect. 


Solution, Denoting liking of chocolates, toffees and sweets by 4, B and С 
respectively, the data would Бе : 
N=1,000, (4)—811, (B) =752, (C)=418 
(AB)=570, (AC) —356, (BC)=348, (ABC) —297. 
If one of the ultimate class frequencies is Negative, the given data would be in- 
consistent. 
(«ВС)= (BC)—(ABC)—348—297—51 
(ABy) —(4B)—(ABC) =570—297=273 
(48C)— (АС) — (ABC) —356—297—59 
(A8) -(4)— (4B) —(48C) =811—570—59—182 
(«Вү) —(B) —(4B) — (4BC) —752—570—51—131 
(48C) —(C)—(BC)—(48C) —418—348—59—11 
(xy) -N —(4)—(B)—(C)--(4B)3-(4C)-(BC) — (ABC) 
71,000—811—752—418--570--356--348—297— —4 
Hence there is something wrong with the given information. 


Partial Association 


So far we have discussed the association of 4 and B in the universe 
as a whole without finding the other attributes in the universe. It is possi- 
ble, however, that there is no direct relationship between 4 and B, i.e., 
the association between attributes А and В may be due to their associa- 
tion with a third attribute, say, C. Thus if А is positively associated with 
C and if B is associated with C, А may be found to be positively associat- 
ed with B. But this type of association between А and B is not direct— 
it is the effect of their association with a third attribute C. To find out 
whether the association between А and B is real, and not merely due to 
their association with a third attribute C, it would be necessary to study 
the association of А and B in the sub-population C and y. IfA and B 
are associated in both the Sub-population of C and Y it would indicate 
that А and В are really associated with each other. 


The associations between А and B in sub-population are called 
partial associations, to distinguish them from the total associations between 
4 and B in the population at large. The following example will illustrate 
clearly the concept of partial association : 


Ап association is observed between vaccination and exemption from 
attack by the small-pox. It means that vaccination prevents attack of 
small-pox. However, on a detailed analysis one may find that the 
attributes vaccination and attack of small-pox are not directly associated— 
the association between them if due to a third factor, namely, economic 
condition. Those people who are economically well o live in better 
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conditions, open houses, get better food and nutrition and can afford the 
cost of vaccination and as such the possibility of their getting an attack 
of small-pox is less. On the other hand, those who are poor, live in filthy 
conditions, dirty surroundings, dirty houses and because of illiteracy do 
not believe in vaccination. They cannot afford the cost of vaccination 
also. As such they are liable to suffer more from diseases. If we denote 
A for vaccination, B for small-pox and C for economic conditions, we may 
find that there is positive association between A and Cand also between 
Band C. Hence in order to arrive at correct conclusions it is necessary 
that on the basis of economic conditions the population is divided into 
two parts rich (C) and poor (y) and in each sub-population association is 
ascertained between vaccination (А) and prevention from small-pox (B). 
If this third attribute is ignored it will give rise to misleading conclusions 
or technically illusory association. 

The associations found betw 
verse of C's and the universe of y’s are term 
distinguish them from total association found betwee! 
universe at large. 

The methods of finding out partial association are the same as used 
in finding the total association. The only difference is that we have to find 
out separately the association of A and B in C and y. 

FIRST METHOD 


een the attributes A and В in the uni- 
ed as partial associations to 
n A and B in the 


Associ ation | "ÁN Ту. 
between A and B Independent Positive Association 


Sor sub-p opulation | 


Negative Association 


x Ж | 


с amens (anc «68720 (anc) (OEC 

Y (apo e Ax 8d (ago, Vento | (аву < 40X80 
SECOND METHOD 

‚ | e a 


THIRD METHOD (Yule's Coefficient) 
(ABC)(a8C)—(ABCV@BC) 
Олв.с=(АвСу«8С)-_\АбС)«ВС) 
Qap.c= Coefficient of partial association between A and B 
for sub-population re} 
(АВү)(«Вү)—(АВҮ)(«Вү) 
Олвл=(АдВү)(о8ү)+ (ABBY) 
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Association between А and C for sub-population В 
O, c, (APO GBO- (4 By BO)- 
АСВ (ABC) (aBy)+(ABy)(aBC) 
__ (ABC) (aBy)—(ABy) (aBC) 
Олсв= 
(ABC)(aBy) + (ABy)(@BC) 
Jllastration 23. Calculate coefficient of partial association between A and B, 
and A and C from the following data : 
(ABC)=200, (ABy)=210, (48C) —208, (ABy)=190, («BC)—209, 
(aBy)=105, (a8C) =170, (aby) —178. 
Solution. (i) Between A and B for sub-population C 
Op c. ABO) (EBC) (AEC) (aBC) 
АВ.С= (ABC) (C) F (ABC) (aBC) 
_ (200x 170)—(208x209) _ 9412 . v5, 
(200 170)+-(208 х 209)" 77,472 — 
(ii) Between A and В for sub-population ү 
_(ABy\laBy)—Apy)(aßy) (210 х 178) — (190 x 105 ) 
~ (ABy) (ay)-- CAP (Ey) — (210 178)--(190 x 105) 
37380— 19950 17430 0.3 
37380+19950 57330 - 
(ii) Between А and C for sub-population В 
о _ (ABC) (xBy) —(ABy) (BC) (200 х 105) —(210х 209) 
AC.B (ABC) («By)3-(ABY) (xBC) - (200 x 105) +-(210 x 209) 


QABxX 


21000-43890 64890 93> 
(iv) Between A and C for sub-population 8 
(ABC) (aBy)—(ABy) (8С) _ (208 x 178)— (190x 170) 


Олсз = (ABC) (abr) + (Aer) (aBC) (208 x 178) + (190x170) 
37024-32300 4724 _,. 
=37024--32300°_ 69324 ~ +9 068 


Illustration 24, А survey of 1,000 companies gives the following data : 


(i) Companies with a capital of more than Rs. 10 lakhs 510 
(ii) Companies making profits 490 
(iii) Companies under managing agents 427 
(iv) Companies with a capital of more than Rs. 10 lakhs 

and making profits 189 


(v) Companies with a capital of more than Rs. 10 lakhs 
and under managing agents 


(vi) Companies making profits and under managing agents 
Show that the information ‘as it stands must be incorrect. 
(M. Com. Nagpur, 1974) 


140 
85 


Solution : 


Let A denote companies with a capital of more than Rs. 10 lakhs, 
B denotes companies making profits, and 


С denotes companies under managing agents. 
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We are given : 
N=1,000, (A)=510, (B) —490, (C) =427 
(AB)=189, (AC)=150, (BC)=85 
(авсу 2° condition of consistency when positive class frequencies are givén [except 
Ы 4)] 18 : 


(AB) +(AC)+(BC) + (А)+(В)+(С)— N 
Substituting the given values 
1894-1404-85 + 5104-490--427—1000—414 < 427 which is not possible. 
Hence the information as it stands must be incorrect. 


_ _ lilustration 25, Do you find any association between the tempers of brothers 
and sisters from the following data ? 


Good natured brothers and good natured sisters 1,230 
Good natured brothers and sullen sisters 850 
Sullen brothers and good natured sisters 530 
Sullen brothers and sullen sisters 980 


(M. Com. Nagpur, 1973) 


Solution ; 
Let A denote good natured brothers, and B denote good natured sisters. 
*. a will denote sullen brothers, and 8 will denote sullen sisters. 
We are given (4B) =1230, (48) —850, (В) 530, («8) —980 
g— (4X8) C48 (B) 
(AB) (a8) + (AB) («B) 
Substituting the given values 
(1230)(980) —(850)(530) 1205400—450500 . 754900 .. +0456 
7(1230)(980)4-(850)(530) — 12054004-450500 1655900 

We thus find positive association between the tempers of brothers and sisters. 
б Illustration 26. In a certain class it was found that 7095 of the students, passed 
in half-yearly examination, 30% students passed in half-yearly and annual examination 
while 28% were such who passed in annual but failed in half-yearly examination. Find 
the percentage of students who : 

(a) passed in annual examination, 

(b) passed in half-yearly but failed in annual examination, and 

(c) failed in both thejexaminations. 

Solution : 

Let A denote those passing annual examination 

B denote those passing the half-yearly examination 

« and 8 will represent respectivelyithose who failin the annual examination and 
those failing in the half-yearly examination. 

We аге given: (B)=70%, (AB) =30%, (АВ) =28%, N=100 

We are to find out 

(i) (A), (ii) (2B), and (iii) (a8) 
(i) (A)=(AB) + (48) =30+28=58% 


Hence the percentage of students who passed in the апп 
(ii) (аВ) —(B)— (AB)— 7095 —30% 4075. 
Hence the percentage of students who failed in annual but passed in Һа! -у early 


(iii) (аву= (8)—(48)= N— (B) —(48) —100—70—28—2. 
Hence 2% students failed in both the examinations. 


ua] examination is 58. 


is 40, 
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Illustration 27. In two towns А and B, the following information was supplied 
by an investigator : 


Town A Town B 
Total population (in thousands) 240 234 
Literates GJ) 40 34 
Illiterate criminals („›) 40 20 
Literate criminals GI 5 2 
Compare the degree of association between literacy and crime in each of the 
two towns. (B.A. Hons. Econ., Delhi, 1973) 
Solution : 
Let A denote literates 4. a will denote illiterates 
Let B denote criminals -. B will denote non-criminals 
In terms of these symbols the given information is expressed as below : 
Town A Town B 
N 240 234 
(A) 40 34 
(aB) 40 20 
(4B) 5 2 
The missing frequencies can be ascertained by the nine-square table 
TownA — Town B 
A « Total A x Total 
- PO zt 
B| (АВ) | («В) | (B) В| (AB) | («B (B) 
| 5 40 45 ч ) ah) 22 
P| (48) (aß) (B) 6 | (AB) (8) (B) 
| 35 160 195 32 180 212 
Ba oot | p 
Total| (A) (a) N Total (A (a) N 
40 | 200 240 S 200 234 | 
о с alas 7 71 — i — 
Town A 
Q— (4B) Са®)—(АВ) («В)_ (5) (160)—(35) х (40) 
(AB) (a8)-- (48) («B) — (5) х(160) (35) x (40) 
800-1400 —600  .. 
~ 800+ 1400~ 2200 —— 9273 
Town B 


(2) x (180) — (32) х (20) 360—640 —280 ' 
(2) x (180) +32) x (20)7360--640— 1000 = —0 28 


Thus there is dissociation 
The degree of dissociation is slightl 


between. literacy and criminality in both the towns. 
y more in town B compared to town А 


Illustration 28. According to a survey the following results were obtained : 


x Girl: 
тоге candidates appeared at an examination Ej 200 
Married and successful E. 20 
Unmarried and successful 


550 110 
Find the association between marita status and thes inati 
C uccess at the examination 
both for boys and girls. (М.А. Econ. Jabalpur, 1975) 
Solution : 


Let А denote married <. t will denote unmarried 


р 
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_ Let B denote success LES ill ссез5! 
we will find coefficient of association both for boys x prid gs ip 


Boys: (4)—150, (AB)=70, N=800, (xB)—550. 


п By putting these given values in the nine-square tab! i 
missing ultimate class frequencies. Able, мелаи 


4 a Total 
ў | (AB) (5 n Thus from the table 
10056 (48) —80 and (x) —100 р, 
ЕЕ ДЕ; = (4B (48) (48) (@B) 
(AB) (P) F (48) В) 
в (489) | (е8) Dg ... (70x 100)—(80 x 550) 
80 | 100 180 (70x 100)-F (80x 550) 
7000—44000 
| == 09725 
тош! | (0 | (9 N 000 а 1000 
150 650 800 
Girls, (4) 50, (4B)—20, N=200, («B) 110 
А « Total 
(AB)=30 and (28)—40 
B| (AB) (<В) (В) g— C0x401—(30x 110) 
20 110 130 (90х40) (30X 110) 


- ..800—3300 —2500 _o'61 


8005-3300 4100 


B| (48) (a8) (8) 

30 40 70 

Total (A) (a) N 
50 150 200 


It is clear from the answer that the marital status and the success at the 
examination are negatively associated both for boys as well as girls. 

Illusory Association. Although a set of attributes independent of 
Aand B will affect the association between them the existence of an 
attribute C with which they are both associated may give an association in 
the population at large which is illusory in the sense that it does not 
correspond to any real relationship between them. 


Illusory association may also arise in a different way through the 
personality of the observer or observers- If the observer’s attention 
fluctuates, he may be more likely to notice the presence of A when he 
notices the presence of Band vice versa; in such a case A and B (so far 
as the record goes) will both be associated with the observer’s attention C, 
and consequently an illusory association will be created, Again, if the 
attributes are not well defined, one observer may be more generous than 
another in deciding when to record the presence of A and also the presence 
of B and even one observer may fluctuate in the generosity of his 
marking. In this case the recording of A and the recording of В will 
both be associated with the generosity of the observer in recording their 
Presence, C, and an illusory association between A and B will consequently 
arise. 

SUGGESTED READINGS 
Yule & Kendall: An Introduction to the Theory of Statistics, Chs. I and П. 


13 Index Numbers 


Historically the first index was constructed in 1764 to compare the 
Маап price index in 1750 with the price level in 1500. Though origi- 
nally developed for measuring the effect of change in prices they have 
become today one of the most widely used statistical devices and there 
is hardly any field today where index numbers are not used. Newspapers 
headline the fact that prices are going up or down, that industrial pro- 
duction is rising or falling, that imports are increasing or decreasing, that 
crimes are rising in a particular period compared to the previous period 
as disclosed by index numbers. They are used to feel the pulse of the 
Sconomy and they have come to be used as indicators of inflationary 
or deflationary tendencies. In fact, they are described as barometers of 
economic activity, i.e., if one wants to get an idea as to what is happen- 
ing to an economy he should look to important indices like the index 
number of industrial production, agricultural production, business acti- 
vity, etc, 

An index number* may be described as a specialized average desig- 
ned to measure the change in a group of related variables over a period 
of time. Thus when we say that the index number of wholesale prices 
is 125 for the period 1976, compared to 1975, it means there is a net in- 
crease in the prices of wholesale commodities to the extent of 25 per cent. 

For a proper understanding of the term index number, the follow- 
ing points are worth considering : f 

(1) Index numbers are specialized averages. As explained in the 
chapter on measures of central value, an average is a single figure repre- 
senting a group of figures. However, to obtain an average the items 
must be comparable, for example, the average weight of men, women 
and children ofa certain locality has no meaning at all. Furthermore, 
the unit of measurement must be the same for all the items. Thus an aye- 
rage of the weight expressed in kg., 1b., etc., has no meaning. However, this 
is not so with index numbers. Index numbers are used for purposes of com- 
parison in situations where two or more series are expressed in different 
units or the series are composed of different types of items. For example, 
while constructing a consumer price index the various items are divided 
into broad heads, namely (i) Food, (ii) Clothing, (iii) Fuel and Lighting, 
(iv) House „Rent, and (v) Miscellaneous. These items are expressed in 
different units : thus, under the head ‘food’ wheat and rice may be quo- 
ted per quintal, ghee per kg. etc. Similarly, cloth may be measured in 
terms of metres. An average of all these items expressed in different 
units is obtained by using the technique of index numbers. 


* An index number is a device which shows by i iati i 
А ndex y its variation the changes ina 
magnitude which is not capable of accurate measurement in itself. i i 
in practice, —Wheldon : Business Statistics. ee ee 
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if the consumer price index of working class for Delhi has gone up to 
125 in 1976 compared to 1975 it means that there is a net increase of 
25/5 in the prices of commodities included in the index. Similarly if the 
index of industrial production is 108 in 1976 compared to 1975 it means 


that there is a net increase in industrial production to the extent of 8%. 
howing a net 


It should be carefully noted that even where an index is $ 
increase, it may include some items which have actually decreased in 


value and others which have remained constant. 

(3) Index numbers measure the effect riod of 
time. Index numbers are most widely used for measurin 
a period of time. Thus we can find out the net change in agric 
Wices from the beginning of First Plan period to the en 


tural production, industrial production, imports, exports, 


two different times. However, it 

not only measure changes over а period of time but also compare econo- 
mic conditions of different locations, different industries, different cities Or 
different countries. But since the basic problems are essentially the same 
and since most of the important index numbers published by the Govern- 
ment and private research organisations refer to data collected at different 
times, we shall consider in this chapter index numbers measuring change 
relative to time only. However, methods described can be applied to 


other cases also. 
Uses of Index Numbers 


Index numbers are indispe i 
analysis. Their significance can be best appreciated 


points : 3 
(1) They help in framing suitable policies. Many of the economic 
and business policies are guided by index numbers. For example, for 


deciding the increase in dearness allowance of the employees, the employ- 
ers have to depend primarily upon the cost of living index. _If wages 
and salaries are not adjusted in accordance with the cost of living, Very 
often it leads to strikes and lock-outs which in turn cause considerable 
waste of resources. The index numbers provide some guidepost that one 


can use in making decisions. 
Though index numbers are most widely used in 


business and economic conditions, there is a large number O 
eful. For example, sociologists may 


fields also where index numbers are us nple, | 
speak of population indices ; psychologists measure intelligence quotients 


which are essentially index numbers comparing а person's. intelligence 
score with that of an average for his or her age ; health authorities prepare 
f hospital facilities and 


indices to display changes in the adequacy 0! 
educational research organisations have devised formulae to measure 
changes in the effectiveness of school systems. 
_ (2) They reveal trends and tendencies. Since index numbers are most 
widely used for measuring changes over à period of time the time series 50 
formed enable us to study the general trend of the phenomenon under 
study. For example, by examining index number of imports for India for 
the last 8-10 years we can say that our imports are showing an upwar' 
tendency, i.e., they are rising year after year. Similarly by examining the 


index numbers of industrial production, business activity, etc., for the last 


nsable tools of economic and business 
by the following 
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few years we сап conclude about the trend of production and business 
activity. By examining the trend of the phenomenon under study we can 
draw very important conclusions as to how much change is taking place 
due to the effect of seasonalit » cyclical forces, irregular forces, etc. 
Thus index numbers are highly useful in studying the general business 
conditions. 

(3) Index numbers are very useful in deflating.* Index numbers are 
highly useful in deflating, i.e., they are used to adjust the original data 
for price changes, or to adjust wages for cost of living changes and thus 
transform nominal wage into real wages. Moreover, nominal income can 
be transformed into real income and nominal sales into real sales through 
appropriate index numbers. 


Classification of Index Numbers 


Index numbers may be classified in terms of what they measure. In 
€conomics and business the classifications are : (1) price ; (2) quantity ; 
(3) value ; and (4) special ригроѕе. 

Only price and quantity index numbers are discussed in detail. The 
other will be mentioned, but without detail of how to construct them 
since both value and special purpose index numbers do not offer new 
problems in construction. Since the detail of construction of all types of 
index numbers can be understood if the construction of price index 
numbers is understood, we shall devote major attention to them. 


Problems in the Construction of index Numbers 


Before constructing index numbers a careful thought must be given 
to the following problems : 

. 1. The purpose of the Index. At the Very outset the purpose of 
constructing the index must be very clearly decided—what the index is to 
measure and why ? There is no all-purpose index. Every index is of limited 
and particular use. Thus, a price index that is intended to measure 
consumers’ prices must not include wholesale prices. And if such an 
index is intended to measure the cost of living of poor families, great care 
should be taken not to include goods ordinarily used by middle class and 
upper-income groups. Failure to decide clearly the purpose of. the index 
would lead to confusion and wastage of time with no fruitful results. АП 
other problems such as the base year, the number of commodities to be 
included, the prices of the commodities, etc., are decided in the light of 
the purpose for which the index is being constructed. 


The problem of the scope of the index, ie., the field covered by the 


2. Selection of a base period. Whenever index numbers are 
i The base period is 


be a year, a month or a day. The index for base period is always taken 
as 100. Though the selection of the base period would primarily depend 


* For details please refer to page 13:34. 


f Index n umbers may be constructed for a single commodity, called simple index 
numbers or for a group of commodities called composite index numbers, 
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upon the object of the index, the following points need careful 
consideration of base poriod : 

(i) The base period should be a normal one. The period that is selected 
as base should be normal, i.e., itshould be free from abnormalities like 
wars, earthquakes, famines, booms, depressions, etc. However, at times it is 
really difficult to select a year which is normalin all respects—a year 
which is normal in one respect may be abnormal in another. To solve 
this problem an average of a number of years, 3 or 4 (preferably covering 
one complete cycle), may be taken as the base. The process of averaging 
will reduce the effect of extremes. Thus the average of the period from 
1970 to 1972 may be considered normal, whereas no individual year in that 
span may be considered normal. 

(ii) The base period should not be too distant in the past. It is 
desirable to have an index based o: a fairly recent period, since 
comparisons with a familiar set of circumstances are more helpful than 
comparisons with vaguely remembered conditions. For example, for 
deciding increase in dearness allowance at present there is no advantage 1n 
taking 1950 or 1960 as the base; the comparison should be with 
the preceding year or the year after which dearness allowance has not been 
revised. 

(iit) Fixed base or chain base.. While selecting the base a decision 
hasto be made as to whether the base shall remain fixed or not, i.e., 
Whether we have a fixed base or chain base index. In the fixed base 
method, the year or the period of years to which all other prices are related 
is constant for all times. On the other hand, in the chain base method the 
prices of a year are linked with those of the preceding year and not with 
thefixed year. Naturally the chain base method gives à better picture 
than what is obtained by the fixed base method. However, much would 
depend upon the purpose of constructing the index. 

3. Selection of number of items. The items included in an index 
should be determined by the purpose for which the index 1s constructed. 
Every item cannot be included while constructing an index number and 
hence one has to select а sample. For example, while constructing a 
price index it is impossible to include each and every commodity. Hence, 
it is necessary to decide what commodities to include. The commodities 
should be selected in such a manner that they are representative of the 
tastes, habits and customs of the people for whom the index is meant. 
Thus in a consumer price index for working class, items like scooters, 
motor cars, refrigerators, cosmetics, ete., find no place. A decision must 
also be made on the number of commodities to be included and their 
qualities. Here we should note that the larger the number of commo- 
dities included, the more representative shall be the index but at the same 
time the greater shall be the cost and the time taken. The purpose of 
the index shall help in deciding the number of commodities. Thus, in 
a general price index a larger number of commodities shall have to be 
included as compared to a specific purpose index as the index number 
of the prices of foodgrains or industrial raw materials. 

. It is also necessary to decide the grade or quality of the items to 
be included in the index. Index numbers shall give wrong result if at one 
time one set of qualities is included and at another time another set. To 
avoid confusion about qualities it is desirable that as far as possible only 


e  —nw 
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standardized or graded items are included so that they can be easily 
indentified after a time lapse. 


the next problem is to obtain Price quotations for these commodities, It 
is а well known fact that prices of many commodities vary from place to 
place and even from shop to shop in the same market. It is impracticable 
to obtain price quotations from all the places where a commodity is dealt 


In order to ensure uniformity the manner in which prices are to 
be quoted must also be decided, There are two methods of quoting prices : 
(i) money prices, and (ii) quantity prices. In the former case prices are 
quoted per unit of commodity, for example, sugar Rs. 200 per quintal 
(100 kg.) and in the latter case prices are quoted per unit of money. 
Thus, sugar may be quoted as 1/2 kg. for one rupee. The former method 
is free from confusion and is generally adopted while quoting prices. 


retail prices are required. The choice would depend upon the purpose of 
the index. Thus in a Consumer price index the wholesale price shall not 


5. Choice of an average. Since index numbers are specialized 
averages a decision has to be made as to Which particular average (i.e., 
arithmetic mean, median, mode, geometric mean or harmonic mean) 
should be used for constructing the index. Median, mode and mean are 


(iii) index numbers calculated by using this average are reversible and 


therefore, base shifting is easi] i i i 
, | Y Possible. The geometric mean index 
always satisfies the time reversal test. 


' Despite theoretical justification for favouring geometric mean, arith- 
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6. Selection of appropriate weights 


E46 


* The problem of selecting, 


suitable weights is quite important and at the same time quite difficult 
term ‘weight’ refers to the relative importance of the 


to decide. The 
different items in 
equal importance 
Whereby the var 


count. This is done by allocating weights. 
types of indices—unweighted indices 


mer case, no Sp! 


сазе specific weights 
pointed out here that no in 


weights implicitly 


importance to all the items and hence Wi 


{һе construction of the index 


. АП items are not of 


and hence it is necessary to devise some suitable method. 
ying importance of the different items is taken into ac- 


ecific weights are assigned 


Thus we have broadly two: 
and weighted indices. In the for- 


whereas in the latter 


are assigned to various items. It may be 
dex is unweighted in strict sense of the term as 


enter in unweighted indices because we are giving equal 


eights are unity. It is, therefore, 


necessary to adopt some suitable method of weighting so that arbitrary 
and haphazard weights may not affect the results. 


There are two metho 


explicit. 


In the implicit weighting, a commodity or 
the index a number of times. Thus, if wheat 


twice as much weight as rice, then tw 


ds of assigning weights : (i) implicit, and (ii)- 


\ 
its variety is included in 
is to be given in an index 


o varieties of wheat against one of 


rice may be included in the series. On the other hand, in case of explicit 


weighting some outwar 


d evidence of importance of the various items in 


the index is given. When the explicit weights are assigned the questions 


are: (i) By what do we W 


(i) In order 


or distribution fig 
(ii) Weights 
А quantity weigh 


on the other hand, combines price with qu 


eigh ? and (ii) What type of weight do we use ? 


х to bring out the economic importance of the commo- 
dities involved the weight can be production figures, consumption figures 


ures. 


are of two types : quantity weights and value weights. 
t, symbolised by q, means the amount of commodity 
produced, distributed, or consumed in some time period. A value weight, 


antity ‘produced ; distributed 


or consumed’. Value is in terms of rupees and is symbolised by px 4 where: 


р stands for the p: 


Now the question is whether to choo: 


rice and q for the quantity. 


weights. The statistician is not free to choose 
while constructing index, then quantities are used as 
weights because price times quantity will always give the same units, 

On the other hand, in averaging price relatives quantity: 
used. It is for the reason that if we multiply percentages. 


method is used 


namely, rupees. 

figures cannot be 
by quantities exp: 
for example, perce 
kgs. will give kgs. 


ressed in different units, we get 


se quantity weights or value 


here. If the aggregative: 


results in different units ; 


ntages times tonnes will give tonnes and percentages times 


Such figures cannot be used 


in computation. But if 


percentages are multiplied by value figures, which are always expressed in 
rupees, we get answer in rupees only. Hence the statistician will use q as 


a weight in the m 


ethod of aggregat 


ing actual prices and must use pxq 
asia weight in the method of averaging price relatives.T xd 


*"Weighting is the term used to describe conscious effort to assign to each 


commodity an influence that, in the final result, is proportionate to 1 


iance.”—Richardson. 


Sometimes in the absence of actual weights 


be used as weights. 


However, it is unscientific to use 


ts relative impor- 


arbitrary magnitudes may have to- 


these weights and, therefore, . 


they should be used only in the crudest forms of index numbers. 
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Another problem in connection with weights is that of deciding 
whether the weights shall be fixed or fluctuating. Since the relative 
importance of the different items does not remain the same for all times it 
is logical to vary the weights from time to time. Such an index would 
give better results. However, when fluctuating weights are used one must 
be very careful in interpreting the index because not only changes in prices 
but also changes in weights are affecting the index. 

7. Selection of an appropriate formula. A large number of formulae 
have been devised for constructing the index. The problem very often is 
that of selecting the most appropriate formula. The choice of the formula 
would depend not only on the purpose of the index but also on the data 
available. Prof. Irving Fisher has suggested that an appropriate index is 
that which satisfies time reversal test and factor reversal test. Theoreti- 
cally, Fisher's method is considered as “ideal” for constructing index 
numbers. However, from a practical point of view there are certain 
limitations of this index which shall be discussed later. As such, no one 
particular formula can be regarded as the best under all. circumstances. 
On the basis of this knowledge of the characteristics of different formulae, 
a discriminating investigator will choose technical methods adapted to his 
data and appropriate to his purposes. 

None of the above problems is simple :С solve in practice and the 
final index is usually the product of compromise between theoretical 
standards and the standards attainable with the given data. 

METHODS OF CONSTRUCTION OF INDEX NUMBERS 

A large number of formulae have been devised for constructing 
index numbers. Broadly speaking, they can be grouped under two heads : 

(а) Unweighted indices ; and 

(b) Weighted indices. 

In the unweighted indices weights are not expressly assigned whereas 
inthe weighted indices weights are assigned to the various items. Each 
ofthese types may be further divided under two heads : 

(i) Simple Aggregative, and 

(ii) Simple Average of Relatives. 

The following chart illustrates the various methods : 

Index Numbers 


f 
Unweighted ун 


Simple Simple Average Weighted Weighted 
Aggregative of Relatives Aggregative P of 


Relatives 
A. UNWEIGHTED INDEX NUMBERS 
I, Simple Aggregative Method 
This is the simplest method of constructing index numbers. When 
| this method is used to construct a price index the total of current year 
prices for the various commodities in question is divided by the total of 
base year prices and the quotient is multiplied by 100. Symbolically, 


Poe x100 
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Zp,—total of current year prices for various commodities. 
Xp,—total of base year prices for various commodities. 
This method of constructing the index is the simplest of all the 
methods. The steps required in computations аге: 
E (i) Add the current year prices for various commodities, i.e., obtain 
Py. 
(ii) Add the base year prices for the same commodities, i.e., obtain 
o 
(iii) Divide Хр; by Zp, and multiply the quotient by 100. 
Illustration 1. From the following data construct an index for 1975 taking 1974 
аз base: 


Commodities Prices in 1974 (Rs.) Prices in 1975 (Rs.) 
A 50 70 
B 40 60 
C 80 90 
D 110 120 
E 20 20 
Solution : CONSTRUCTION OF PRICE INDEX 

Commodities Prices in 1974 Prices in 1975 
Dei. Eois. n 
A 50 70 
B 40 60 
G 80 90 
D 110 120 
E 20 20 

r 2p,—300 р1= 360 Ў 


Хр ‚_ 360 "m 
Po= >р, х100;=-уу х100=120 

This means that as compared to 1974, in 1975 there is a net increase їп the prices 

of commodities included in the index to the extent of 20. 
. Limitations of this index. There are two main limitations ofthe 
simple aggregative index : 
. @ The units in which prices of commodities are given affect the 
price index. 

(i) No consideration is given to the relative importance of the 
comm odities. 

This index is based on the assumption that the various items 
and their prices are quoted in the same unit. If the unit of measurement 
is different for different items, the index shall produce vastly divergent 
results. To illustrate, let us consider the problem of calculating an index 
comparing the 1975 cost of construction with that of 1965, and let us 
include only two items, the cost of labour and the price of cement. 


1965 1975 
Average hourly wage to be paid to 
Construction workers Re. 050 Re. 0°80 
Price of cement per bag Rs. 30 Rs. 40 
3p,=40'8, Zpy—30'5 à 
7 P = 408 «100—1338 т cent. 
ү 01 30 5^ UE pe 


i í 
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Now suppose instead of hourly wage we take the weekly wage. 
‘Taking a week of 50 hours we can rewrite the figures as : 


Average weekly wage Rs. 25 Rs. 40 
Price of cement per bag Rs. 30 Rs. 40 
Now 2p,—80, Zp,—55 


Ба Pa— Ss x 100=145°5 per cent. 

One can see at once that when we used weekly wage instead of 
hourly wage our index increased from 1338 to 145'5. Manipulations are 
thus possible to suit one's requirements by quoting prices per kg. rather 
than per pound or per maund rather than per quintal, and so оп. Any 
index whose value can be manipulated in this manner can hardly be used 
as an objective measure. Moreover, equal importance is given to all the 
items which is wrong. It is for these reasons that the simple aggregative 
index has never gained any great degree of acceptance. 


UA "Illustration 2. For the data given below, calculate the index number by taking 
E (i). 1963 as the base year 
(ii) 1970 as the base year 


4 (iii) 1963 to 1965 as the base period. 
Year Price of wheat Year Price of wheat 
per kg. per kg. 
1963 4 1968 10 
1964 5 1969 9 
^ 1965 6 1970 10 
1966 7 1971 п 
1967 8 
Solution: (1) INDEX NUMBERS TAKING 1963 AS THE BASE YEAR 
Year Price of Index Number Year Price of index Numbers 
Wheat (1963—100) Wheat (1963= 100) 
(per kg.) (per ke.) 
1963 4 100 1968 10 m х100=250 
1964 5 2.100125 1969 9 2 x100=225 
1965 6 -$-x100=150 1970 10 2 x100—250 
1966 7 Z x100=175 1971 п J x100=275 
1967 8 i X 100—200 
(ii) INDEX NUMBERS TAKING 1970 AS THE BASE YEAR __ 
Year Price of Index Numbers. Year Price of ^ Index Numbers 
С Wheat (1970=100) Wheat (1970=100) 
(ver kg.) Dole pen Ee) t o! D ent 
1963 4 A x100-40 1968 10 29 199—109 
10 10 
Jost) nies so X100-50 196 9 ag х100=90 
6 i 10 
1965 6 jg 100=60 1970 10 -io X100- 100 
E 3 п 
1966 7 10 х 100=70 1971 п ^io 100—110 
8 


1967 8 ло ^ 100=80 


INDEX NUMBERS E-13:10 


(iii) INDEX NUMBERS TAKING 1963 to 1965 AS THE BASE PERIOD 
When 1963 to 1965 is to be taken as base it means we have to take an average 
of 1963, 1964 and 1965. 


Average 5+6 =5. 
Hence 1964 will be taken as 100. 
Year Price of Index Numbers Year Price of Index Numbers 
wheat (1963 to 1965) wheat (1963 to 1965) 
(per kg.) as base (per kg.) as base 
* 1963 4 4 x100=80 1968 10 2 x100=200 
1964 5 5-х100=100 1969 9 2-x 100=180 
1965 6 6 x100=120 1970 ю 10 х100=200 
7 11 
1966 7 5 x100—140 1971 11 5 X100 N 
1967 8 $x 100=160 ke 


IL Simple Average of Relatives Method 

When this method is used to construct a price index, first of all 
price relatives* are obtained for the various items included in the index 
and then average of these relatives is obtained using any one of the mea- 
sures of central value, i.e., arithmetic mean, median, mode, geometric 
mean or harmonic mean. When arithmetic mean is used for averaging 
the relatives, the formula for computing the index is 

(2x 100) 
Pu= 


Po 


where N refers to the number of items (commodities) whose price relatives 
are thus averaged. 

Although any measure of central value can be used to obtain the 
overall index, price relatives are generally averaged either by the arith- 
metic or the geometric mean. When geometric mean is used for 
averaging the price relatives the formula for obtaining the index becomes 


zio 2 x 100 | ze 
log P= Po or 2108 Р where р= 109 
N N Po 
OR 
Ру 
= log— x 100) 
( манаг c antilog ~log P. 
N N 
Other measures of central value are not in common use for averaging 


relatives. 

Illustration 3. From the following data construct an index for 1975 taking 1974 
as base by the average of relatives method using (a) arithmetic mean, and (6) geometric 
mean for averaged relatives. 


* A price relative is the price of the current period expressed asa Percentage of 
the price at the base period. 


SM-E-977-27 


P,,—antilog 
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Commodities Price in 1974 ' Price in 1975 
! (R5.). и (Rs.) 
A 50 70 
B 40 я 60 м 
c 80 90 
D 110 120 
E 20 20 
Solation ; (a) INDEX NUMBER USING ARITHMETIC MEAN OF 
PRICE RELATIVES 
Commodities Price in 1974, 3 Price in 1975 Ñ Price Relatives y 
(Rs.) (Rs.) Lea 
ТЕ Po р Po M 
A 50 70 14070 
B 40 60 15070 
c 80 90 125 
D 110 120 1091 
E 20 204 аа о 1000 C 
Z^ x100—611:6 
— —— ——Á-- - Po ZI = 
- X100. gir 
Pm NT -——. 12232 


T) INDEX NUMBER USING GEOMETRIC MEAN OF PRICE RELATIVES 
Commodities Price in 1974 Price in 1975 Price ae Log P 


Po Ру 
“А 50 70 140°0 21461 
1 B 40 60 1500 271761 
y c 80 90 11275 2`0512 
р 110 120 1091 270378 
ЗЕ 20 20 1000 270000 
Zlog P—10:4122 


lcm ou 


(0 PucAnilg[.3 lee ? ]- лано [ 729417] Antilog 20822-—120 

- Although arithmetic mean and geometric mean have both been used, 
the arithmetic mean is often preferred because it is easier to compute and 
much better known. Some economists, notably F. Y. Edgeworth, have 
preferred to use the median which is not affected by a Single extreme 
Value. Since the argument is important only when an index is based on 
а very small number of commodities, it generally 
weight and the median is seldom used in actual practice. 


Tllustration 4, Prepare Index Numb: 


ers of prices for three with avera 
Brice as base from the data given below: 5 ix а 
Rate per rupee 
Wheat Cotton Oil 
Ist year 10 kg. 4 kg. 3kg. 
2nd year 9 kg. 31 kg. 3 I 
3rd year Hi 9 kg. 3 kg. 2i kg. 


Solution, Convert first the Prices into rupees for 40 kg. Then determine the 
average price and with this average price as base compute relatives, TN 


, 


EN 
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" [Base : Average price (in rupees) —100] 


Average| First year Second Year Third Year 
Commo- Unit price | >. к: 
dities =100 T 
(Base) | Price | P Price Р Price P 
E - | hes] = 
Wheat | 40 kg. аз | 40| 9 44 100 | 44 102 
Cotton ET 116 | 100 86 114 98 133 15 
Oil РЦ 142 | 133 94 133 94 | 160 113 
Total of Relatives | 273 294 330 
Average of Relatives | 91 98 110 


Note, P indicates Price Relatives. 
For 1st year 10 kg. wheat cost Re. 1. 


Hence 40 kg. wheat will cost x x40=Rs. 4. 


Similarly other prices are obtained. 
4'3 is the average of 4, 4'4 and 4°4, i.e., the respective prizes of the three years. 


Merits and Limitations of the Method 


Merits. This method has the following two advantages over the - 
previous method : 

. 1l. Extreme items do not influence the index. Equal importance is 
given to all the items. 

2. The index is not influenced by the units in which prices are 
quoted or by the absolute level of individual prices. Relatives are pure 
numbers and are, therefore, divorced from the original units. Conse- 
quently, index numbers computed by the relative method would be the 
same regardless of the way in which prices are quoted. This simple 
average of price relatives is said to meet what is called the units test.* 

Limitations. Despite these merits this method is not very satisfac- 
tory because of two reasons : 

1. Difficulty is faced with regard to the selection of ап appropriate 
average. The use of the arithmetic mean is considered as questionable 
Sometimes because it has ап upward bias. The use of geometric means 
involves difficulties of computation. Other averages are almost never 
used while constructing index numbers. 

2. The relatives are assumed to have equal importance. This is 
again a kind of concealed weighting system that is highly objectionable 
since economically some relatives are more important than others. 


B. WEIGHTED INDEX NUMBERS 


The so-called unweighted index numbers discussed above are not 
unweighted in the true sense of the term. They assign equal importance 
to all the items included in the index and as such they are in reality 
weighted, weights being implicit rather than explicit. As discussed earlier, 
in case of unweighted indices it is possible to get different results by 
changing the importance of different items by quoting prices relative to 
different units. - Implicit weighting (or the unweighted index) is far from 


* Please see under "Tests of Adequacy Ж Index Number Formulae". 
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realistic in most of the cases. Construction of useful index numbers 
requires a conscious effort to assign 10 each commodity a weight in accor- 
dance with its importance in the total phenomenon that the index is 
supposed to describe. 

Weighted index numbers are of two types : 

І. Weighted Aggregative Indices, and 

Il. Weighted Average of Relatives Indices. 
1. Weighted Aggregative Index Numbers 

These indices are of the simple aggregative type with the funda- 
mental difference that weights are assigned to the various items included 
in the index. There are various methods of assigning weights and 
consequently a large number of formulae for constructing index numbers 
have been devised of which some of the more important ones are : 
Laspeyres method, 

Paasche method, 

Dorbish and Bowley's method, 

Fisher's ideal method, 

Marshall Edgeworth method, and 

. Kelly's method. 

" All these methods carry the name of persons who have suggested 
them. 

l. Laspeyres Method*. In this method the base year quantities are 

taken as weights. The formula for constructing the index is : 
Epig 
Pa. x 100 
onr Poo 

Steps. (i) Multiply the current year prices of various comm odities 
with base year weights and obtain Zp;q,. 

(ii) Multiply the base year prices of various commodities with base 
year weights and obtain Zp,q,. 

. . Gii) Divide Zp,q, by Zp,q, and multiply the quotient by 100. This 
gives us the price index. 

Laspeyres Index attempts to answer the question *What is the 
change in aggregate value of the base period list of goods when valued at 
given period prices ?” This index is very widely used in practical work. 

2. Paasche Method.t In this method the current year quantities are 
taken as weights. The formula for constructing the index is : 


ANAL 


рид 
Py =s- 100. 
9 UR Epod; 
Steps. (i) Multiply current year prices of various commodities with 
current year weights and obtain Zp,q,. 


(ii) Multiply the base year prices of various commodities with 
current year weights and obtain Уру. 


(iii) Divide pit; by Zp,q, and multiply the quotient by 100. 
In general this formula answers the question “What would be the 
value of the given period list of goods when valued at base-period prices ?? 


A This method was deyised by Laspeyres in 1871 and that is why it is so called, 
t This is after German statistician Paasche who first used it in 1874, 
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Comparison of Laspeyres and Paasche methods. From a practical 
point of view, Laspeyres index is often preferred to Paasche's for the 
simple reason that in Laspeyres index weights (gj) are the base year 
quantities and do not change from one period to the next. On the other 
hand, the use of Paasche index requires the continuous use of new quantity 
weights for each period considered and in most cases these weights are 
difficult and expensive to obtain. In most countries index numbers are 
constructed by using Laspeyres formula. 


An interesting property of Laspeyres and Paasche indices is that the 
former is generally expected to overestimate or to leave an upward bias, 
whereas the latter tends to underestimate, i.e., shows a downward bias. 
When the prices increase there is usually a reduction in the consumption 
of those items for which the increase has been the most pronounced and, 
hence, by using base year quantities we will be giving too much weight to 
the prices that have increased the most and the numerator of the 
Laspeyres index will be too large. When the prices go down, consumers 
often shift their preference to those items which have declined the most and, 
hence, by using base period weights in the numerator of the Laspeyres 
index we shall not be giving sufficient weight to the prices that have gone 
down the most and the numerator will again be too large. Similarly 
because people tend to spend less on goods when their prices are rising 
the use of the Paasche or current weighting produces an index which 
tends to underestimate the rise in prices, i.e., it has a downward bias. But 
the above arguments do not imply that Laspeyres index must necessarily 
be larger than the Paasche's. 


Unless drastic changes have taken place between the base year and 
the given year, the difference between the Laspeyres’ and Paasche's will 
generally be small and either could serve as satisfactory measure. In 
practice, however, the base year weighted Laspeyres’ type index remains 
the most popular for reasons of its practicability. The Paasche type index 
can only be constructed when up-to-date data for the weights are available. 
Furthermore, the price index of a given year can be compared only with 
the base year. For example, let Psg—100, P4,—130 and P,4—145. Then Pg 
and Py, are using different weights and cannot be compared with each 
other. If these indices had been obtained by the Laspeyres formula they 
could be compared because in that case the weights are the same base year 
weights (q,). For these reasons, in practice the Paasche formula is usually 
not used and the Laspeyres type index remains most. popular for reasons 
of its practicability. 

3. Dorbish and Bowley's Method. Dorbish and Bowley have sugges- 
ted simple arithmetic mean of the two indices (Laspeyres and Paasche) 
mentioned above so as to take into account the influence of both the 
periods, i.e., current as well as base periods. The formula for constructing 
the index is : 


Pp, ; where 


L=Laspeyres Index,’ P=Paasche Index 
Zpido E Урі 
Or P= Dido 2 Уру х100 
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4. Fisher’s ‘Ideal’ Index. Prof. Irving Fisher has given a number of 
formulae for constructing index numbers and of these he calls one as the 
‘ideal’ index. The Fisher’s Ideal Index is given by the formula : 


[378 Ураг da 
Рь= ‚71% X100 ог Py=VLXP 
PAK Ур Dod Р 

Tt shall be clear from the above formula that Fisher's Ideal Index is 
the geometric mean of the Laspeyres and Paasche. indices. Thus in the 
Fisher method we average geometrically formulae that err in opposite 
directions. : 

The above formula is known as “Ideal because of the following 
reasons : \ 

(i) It is based on the geometric mean which is theoretically consi- 
dered to be the best average for constructing index numbers. : 

(ii) It takes into account both current year as well as base year 
prices and quantities. 

(iii) It satisfies both the time reversal test as well as the factor 
Teversal test as suggested by Fisher.* 

(00) It is free from bias. The two formulae (Laspeyres’ and 
Paasche’s) that embody the Opposing type and weight biases are, in the 
ideal formula, crossed geometrically, ie., by an averaging process that of 
itself has no bias. The result is the complete cancellation of biases of the 
kinds revealed by time reversal and factor reversal tests. 

Itis not, however, a practical index to compute because it is 
eXcessively laborious. The data, particularly for the Paasche segment of 
the wed are not readily available. In practice, statisticians will continue 
to rely upon simple, although perhaps less exact, index number formulae. 

5. Marshall-Edgeworth Method. In this. method also both the 
current year as well as base year prices and [quantities are considered. 
The formula for constructing the Index is : 


$ Py =p fos 106 
"s : 2(q,--q))p, 
or opening the brackets 
j Ур\й--®руй 
P =P oT Apg 
i Podo 4-Ep,q X100 


It is а simple, readily constructed measure, giving'a* very close 
approximation to the results obtained by the ideal formula. a ans 

6. Kelly's Method. Truman L. Kelly has suggested the following 
formula for constructing index number’: 


Pa= 304% 100 


4 
Here weights are the quantities which may refer to some period, not 
necessarily the base year ог current year. Thus the average quantity of 
two or more years may be used as weights. If in the Kelly's formula the 
average of the quantities of two Years is used as weights, the formula 
becomes 


`n 


рд n Ith 
Tay 100; ^ where g=% 2 


* For proof please see under “Tests of Adequacy of Index Numbers", 
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Similarly the average of the quantities of three or more years can 
be used as weights. This method is known as fixed weight aggregative 
index and is currently in great favour in the construction of index number 
series. An important advantage of this formula is that like Laspeyres’ 
index it does not demand yearly changes in the weights. Moreover, 
the base period can be changed without necessitating corresponding change 
inthe weights. This is very important because the construction of appro- 
priate quantity weights for a general purpose index usually requires а 
considerable amount of work. Weights can thus be kept constant until 
new census (or other survey) data become available to revise the index. 


Illustration 5. Construct index numbers of price from the following data by 
applying : 
1. Laspeyres' method, 
* 2. Paasche's method, 
3. Bowley's method, 
4. Fisher's Ideal method, and 
5. Marshall-Edgeworth method. 


Commodities 1965 1975 
Price Quantity Price Quantity 
A 2 8 4 6 
B 5 10 6 5 
С 4 14 5 10 - 
D 2 19 2 13 4 
Soluti on : CALCULATION OF VARIOUS INDICES 
Commodities ——— 1965 1975 3 
Price Quantity Price Quantity — Pide Pode Pid Рей 
ГАА 8 РІ а dn А 
А 2 8 4 6 32 16 24 12 
B 5 10 6 >] 60 50 30 25 
c 4 14 5 10 70 56 50 40 
D 2 19 2 13 38 38 26 26 
£Xpido Уродо Хр1йо Ph 
S rg =200 =160 =130 —103 
1. Laspeyres’ Method: P= ze X100 ; where Zp199—200, Zpoqo=160 
200 E 
x Pa теру 100—125. 


2. Paasche's Method: Pan 294. 100 ; where Dp191=130, Epen—103 


Pu- 1e х100=1262 
i Zpiy | Zma 200 130 
^ 2pws | pun : 
3. Bowley's Method: ^ Pg— E ÈP y 100; = 160 5 103 10у 
1251126? 199 ; „237 109-1056. 
L+P _ 12541262 y, 
Or Pu 5. = 12551262 155 
4. Fisher's Шей Method : Ра= 2] Хонь Ста у к= | 200 „190 x100 


—4/r8 x100—1256x100—125'6 » 
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5. Marshall-Edgeworth Method : 
Z(qy--31)91 х1од—-2Рї4о+ рза, 


i x100 
A PITE AYA Уройо+ Хро 
200--130 ЮТ: {5 
160-103 < 100—363 X100—12 


Illustration 6. Given the following data what index number will you use for 
purposes of oomparison ? Give reasons, 


Rice Wheat .  Jowar ; 
Year Price Quantity Price Quantity Price Quantity 
1966 93 100 64 11 51 5 
1967 4'6 90 37 10 


27 3 
(B. Com., Tamil Nadu, 1973) 
Solution, Since we are given both current as well as base year prices and 
quantities, Fisher’s Ideal Index shall be most appropriate here, 


CALCULATION OF FISHER’S IDEAL INDEX 


1966 1967 
Commodities Po do Рі а Pigo Bogo Pidi Pod 
Rice 93 100 45 90 4500 930'0 405'0 8370 
Wheat 64 п 3'7 10 407 704 37`0 640 
Jowar 51 D 27 3 13:5 255 81 15'3 


Em. Ур Ура Хро 
—5042 =1,025°9 —4501 =916'3 


Pac | Bride „ Хәта x 199 
900 di Уро 
Хр14о=5942, 2p44,—1025'9 Xp141—4501, Xpyq; 9163. 
cem 
tituting the values P= / 5042. 4501 x190 
нон 1025'9 * 9163 
—4/ 02414 X 100=0'491 x 100—491 
Illustration 7, Calculate the weighted price index from the following data : 


Materials Unit | Quantity Price during 
# required required 1963 1973 
Rs. Rs. 
Cement 100 Ib, 500 Ib. 5'0 8'0 
Timber eft; 2,000 c.ft, 9'5 142 
Steel sheets cwt, 50 cwt, 340 420 
' Bricks per '000 20,000 120 24'0 


(B.A. Hons. Econ., Delhi, 1973) 
Solution. Since the weights here are fixed weights neither relating to current 
year nor relating to base year, we apply Kelly's method for computing index, 


CALCULATION OF WEIGHTED PRICE INDEX 


Price duri 
Materials Unit Quantity 1963 i 
FeWied ^ 4 Po Рі Pod Pid 
Cement 100 Ib, 500 Ib. 50 ЕЕ. Du 
Timber с. 2,000 с.т. 9'5 142 19,000 28,400 
Steel sheets — cwt. 50 cwt. 340 420 1,700 2,100 
Bricks Per’000 20,000 1205 9.240 240 480 
Хра Ур 
=20,965 =31,020 
_ 2719 
Q0 Pu- S X100 ; 2р14=31,020, Хрьд=20,965 


Substituting the values P,,— 2.020 X100—147:96 


: 
| 
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П. Weighted Average of Relatives 

In the weighted aggregative methods discussed above price relatives 
were not computed. However, like unweighted relative method it is also 
possible to compute weighted average of relatives. For purposes of ave- 
raging we may use either the arithmetic mean or the geometric mean. 
The steps in the computation of the weighted arithmetic mean of relatives 
index number are as follows : 

(i) Express each item of the period for which the index number is 
being calculated as a percentage of the same item in the base period. 


(ii) Multiply the percentages as obtained in step (i) for each item 
by the weight which has been assigned to that item. 

(iii) Add the results obtained from the several multiplications carried 
out in step (ii). 

(iv) Divide the sum obtained in step (iii) by the зит of the weights 
used. The result is the index number. Symbolically, 


By Å where P=Price Relative 


*y—Value Weights, i.e., Pogo 
Instead of using arithmetic mean the geometric mean may be used 
for averaging relatives. The weighted geometric mean of relatives is com- 
puted in the same manner as the unweighted geometric mean of relatives 
index number except that weights are introduced by applying them to the 
logarithms of the relatives. When this method is used the formula for 
computing the index is : 


" ZVlogP]. ND 
Py =Antilogy = 595 P) ; where P= 5 x100 
and V=value weight, i.e., рф for each item. 


Steps. (i) Obtain percentage relatives for each item. TM 

(ii) Find the logarithm of each percentage relative found in step (i). 

(iii) Multiply the logarithms by the weights assigned. , 

(iv) Add the results obtained in step (iii). 

(v) Divide the total obtained in step (iv) by the sum of the weights. | 

(vi) Find the antilogarithm of the quotient obtained in step (у). 
This is weighted geometric mean of relatives index number. 

The following example shall illustrate the steps : 

Illustration 8. From the following data compute price index by applying 
weighted average of price relatives method using : E 

(a) arithmetic mean, and 

(b) geometric mean. 


Commodities Po do p 
58 20k n 
Sugar 30 g. Ў 
Fleur T5 40 kg. r6 
Milk 10 10 It. РУ 


* If current year values are employed, the weights arepigi. If theoretical values 
are used as weights, the woights are 71% OF Pog1- Є 
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ion, NUMBER USING WEIGHTED ARITHMETIC MEAN 
Eee e INDEX OF PRICE RELATIVES 


Р ; р x100 
анз Do Ф Py у Р PV 
Sugar Rs.30 .20kg. Rs. 40 60 + x100 8,000 
Flout Rs15 408.  R& 16 ^ 60 15.100 6,400 
Milk REEL ТОН нао e 20 15 x100 1,500 
2=130 ZPV— 15,900 
k Pa Hy 5.900 —12231 


This means that there has been a 2253 per cent increase in prices over the base 


level. 
(6) INDEX NUMBER USING GEOMETRIC MEAN OF PRICE RELATIVES 
Commodities р, 9 Рі V P Log P V.Log P 
ова у Валео 2051... T E 1993 T9 127-494 
Mig Reds 4Ok.. Rs 6 go. 1097 21249 121692 
RUE. X410 "Joi. ^ Rers "do" impo ^ 29282 21:761 | 
Туя ОГ аР3055 XV. log P 
i =270°947 
Pa SAntilog| 2e] “ Antilog [ 270947 ] —Antilog 2084—1213 


The result obtained by applying the Laspeyres method would come 
Out fo be the same as obtained by weighted arithmetic mean of price 
relatives method (as shown below) : 
* 


PRICE INDEX BY LASPEYRES METHOD 


Commodities 1 LN Se Malis бә. e Py ў Pigo "m Pogo 
Sugar Rs. 3°0 20 kg. Rs. 40 80 60 
Flour Rs, 1'5 40 kg. Rs. 1'6 64 60 
Milk Rs, `0 10 1t. Rs. 15 15 10 

Zpiq,—159 2ро40—130 
К _ 2019 . 159 o. 
SCIEN A10 *100=122°3 


. The answer is thesameas that obtained by weighted arithmetic mean of price 
relative method. "This is because the weighted average of price relative method can be 
transformed to simple ageregative method (giyen by Laspeyres) as follows : 


Py 
25 у 
LATIS £0 = 2P190 


270% Хро 


Weighted average Of price relative) is the same 
и 


al prices, the question arises as 10 why do we 
treat itas а separate method Of constructing ind 


‚п t ex numbers. The transformation is * 
sed upon two conditions not necessarily always present: 
l. thatthe arithmetic mean is being used ; and { 


2. that Баѕе усаг values are used аз the Weights, If these two conditions arc 
Rot present, then the original formula js different and лот susceptible to transformation 
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mbers are constructed by combining existing 


se—index numbers—are given in the form o 
elatives.must be used, and the transform- 


in this way. : Moreover, sometimes index nu 
index numbers. Since the data in this ca 
relatives, the untransformed formula involving r 
ed formula is therefore not applicable. 

It may be advisable to use the weighted average price relative me 
another reason, inherent in the average o! 
average of price relatives requires us to compu 
commodity, we develop series of price relatives 
analysed in themselves. 

Illustration 9, By using th 
compute a price index. 


ie average of the quantities of two years as хера: 


Ргісе іп 1974 Price in 1975 


Quantities 
Commodities 1974 1975 (Rs.) (Rs) , 
A 10 16 20 25 
B 9 7 25 28 
c 20 24 40 40 


Solution. COMPUTATION OF PRICE INDEX USING AVERAGE OF THE 
TWO-YEAR QUANTITIES AS WEIGHTS д 


Quantity y Quantity Price in Price in 
in 1974 in 1975 1974 1975 
Commodities ata ) 
m... а 20 E 4r E E Ea ке ыр 
A 10 16 13 20 25 325 260 
B 9 7 8 25 28 224 200 
"C 20 24 22 40 40 880 A 880 — 
р1д! рай 
=1,429 = 1,340 


Applying Kelly's method Ри= х100, where 2руд=1,429, ®р‹4:=1,340 


а= 1429 100=106'64. 
Merits of Weighted Average of Relative Indices "2 

The following are the special advantages of weighted average of 
relative indices over weighted aggregative indices : " 

d (1) When different index numbers are constructed by the average of 
relatives method, all of which have the same base, they can be combined 
to form a new index. 

(2) When an index is computed by selecting one item from each of 
the many sub-groups of items, the values of each sub-group may be used 
as weights. Then only the method of weighted average of relatives is 
appropriate. 

(3) When a new commodity is intro 
used, the relative for the new item may 
old one, using the former value weights. 

(4) The price or quantity relatives 
aggregate are, in effect, themselves a sim 
valuable information for analysis. 

Quantity Index Numbers 

Price index numbers measure and permit comparison of the price of 
certain goods ; quantity «index numbers, on the other hand, measure the 
physical volume of production, construction ог employment. Though 


duced to replace the one formerly 
be spliced to the relative for the 


for each single item in the 
ple index that often yields 


č 


lh 
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price indices are more widely used, production indices are highly significant 
as indicators of the level of output in the economy or in parts of it, 

In constructing quantity index numbers, the problems confronting 
the statistician are. analogous to those involved in price indices. We 
measure changes in quantities, and when we weigh we use prices or 
values as weights. Quantity indices can be obtained easily by changing p 
to q and 4 to p in the various formulae discussed above. 


Thus when Laspeyres method is used QS 1x 100 


020 
When Paasche's formula is used Q,,— hp: x100 
2 o 
When Fisher's formula is used Qu- [Žao раф y 190 
24%р, Xqp 


These formulae represent the quantity index in which the quantities 
of the different commodities are weighted by their prices. However, any 
other suitable weights can be used instead. Ы 

Illustration 10. From the following data compute a quantity index : r, 


Commodity Quantity Price in 
1973 1974 1973 
Rs, 
A 30 25 30 
B 20 30 40 
c 10 15 20 р 
Solution. COMPUTATION OF QUANTITY INDEX 
Commodities 9 а Po Po Чоро 
А 30 25 30 750 900 
B 20 30 40 1,200 800 
С 10 15 20 300 200 — 
Eqip, 54оро 
=2,250 =1,900 1 
Zapo _ 2,250 MUN 
Ол оро X100— 1.900 X100—1184 


Thus, compared to 1973 the quantity index has gone up by 18'4 per cent, 


Mustration 11. Compute by suitable me the i it 
M né ena Eres y method the index number of quantity 


s 1974 1975 
Commodities Price Total value Paice Total value 
b ee 8 80 10 110 
10 
с 16 
Solution, Since we are given t 
fi&ure by dividing value figures s IEA 
Fisher's method for finding out qu: 
COMPUTATION ОЕ Q 
Commo-| E CLR 
ditie s 
o 9 
А 8 10 
B 10 9 
с 16 16 
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Quantity index or Оџ= | Saupe Zabi x 100 = [450 558 (у $ 
UM Xapo < дәрі y 426 * 528 100 
—4/T116 х100=1`056х100=105'6 


Value Index Numbers 

The value of a single commodity is the product of its price and. 
quantity. Thus a value index V is the sum of the values of a given year 
шч by the sum ofthe values of the base year. The formula there-- 
ore is 


y- 20 x100 V=Value index 


Poo 
where Zp,g,— Total value of all commodities in the given period 
and 2poqo=Total value of all commodities in the base period. 
Since in most cases the value figures are given, the formula can be 
stated more simply i 


in which V stands for value. 

In this type of index both price and quantity are variable in the: 
numerator. Weights do not have to be applied, since they are inherent in. 
the value figures. A value index therefore is an aggregate of values. 
It measures the change in actual values between the base and the given 
periods. ` 

The value index is not in wide use, although because of the unsatis- 
factory nature of price and quantity indices, it has been occasionally sugges- 
ted that they be replaced by the value index. The temptation, however, 
must be resisted, since the concepts of price level and quantity level answer 
questions that cannot be answered by the value level. Furthermore, an 
aggregate of values may be viewed as the product of a price level and a 
quantity level. The division of an aggregate of value into its price and 
quantity factors may be arbitrary, but this arbitrariness need not create 
any confusion of thought as long as our concepts of the two factors are 
consistent. 

The test of consistency is that the product of the price and quantity: 
indices must produce the value index. i 

TESTS OF ADEQUACY OF INDEX NUMBER FORMULAE 


Several formulae have been suggested for constructing index numbers. 
and the problem is that of selecting the most appropriate one in a given 
situation. The following tests are suggested for choosing an appropriate 
index : 

Unit Test .* 

Time Reversal Test. 
Factor Reversal Test. 
4. Circular Test. 


1. Unit Test > 
The unit test requires that the formula for constructing an index. 
should be independent of the units in which, or for which, prices and 


Soles 


* Freund and Williams : Modern Business Statistics. 
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quantities are quoted. Except for ‘the simple (unweighted) aggregative 
index all other formulae discussed in this chapter satisfy this test. 


2. Time Reversal Test 

Prof. Irving Fisher has made a careful study of the various proposals 
for computing index numbers and has suggested various tests to be applied 
to any formula to indicate whether or not it is satisfactory. The two most 
important of these he calls the time reversal test and the factor reversal 
test. 


Time reversal test is a test to determine whether a given method will ` 


Work both ways in time, forward and backward. In the words.of Fisher, 
“The test is that the formula for calculating the index number should be such 
that it will give the same ratio between one point of comparison and the 
other, no matter which of the two is taken as base.” In other words, when 
the data for any two years are treated by the same method, but with the 
bases reversed, the two index numbers secured should be reciprocals of 
each other so that their product is unity. Symbolically, the following 
relation should be satisfied : 
PuXPy=1 
when P, is the index for time “1” on time “0” as base and P3, is the 


A 


index for time **0" on time “1” as base. Ifthe product is not unity, there < 


is said to be a time bias in the method. Thus if from 1969 to 1970 the 
price of wheat increased from Rs. 60 to Rs. 80 per quintal the price in 
1970 should be 133} per cent of the price in 1969 and the price in 1969 


the other ; their product (1:333 x75) is unity. This is obviously true for 
each individual price relative and, according to the time reversal test it 
should be true for the index number. 

The testis not satisfied by Laspeyres method and the Paasche 
method as can be seen below : 

When Laspeyres method is used 


Prd 
Р, Ad 
è A "Podo 
Соро. 221% Хро 
à Py XE xo 
& E: Хр t5 97e =P oo Ep ті 
and the test is not satisfied, 
When Paasche method is used 
Zpq > 
, Py р -scDedu. 
“Ул, шуру 


Ўр y pq, 3 
ф Р. ХР, xu mure and the test is not satisfied. 


There are five methods which do satisfy the test : 

` (1) The Fisher's Ideal formula. 
(2) Simple geometric mean of price relatives, 4 
(3) Aggregates with fixed Weights. ? 


(4) The Weighted geometric mea f pri i i Д 
жей. B an o prige relatives if we use fixed 


(5) Marshall-Edgeworth method. ^ 
Let us now see-how Fisher's Ideal formula satisfies the test. 


* 
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Proof : pü- | 20190, inar 
V родо Pon 
" Changing time, i.e., 0 to 1 and 1 to 0 
P= (| Урад 2 Хродә. 
Ур191 ^ Хро 
[ Хрло, Epis | Хрәфі ,, ®роЧо 
ana. =y1=1. 
V родо“ Epi ^ ®р1й\ X" 3pido 
Since Ри X P1971, the Fisher's Ideal Index satisfies the test. 


3. Factor Reversal Test 
Anotlier test suggested by Fisher is known as factor reversal test. t 
‘holds that the product of a price index and the quantity index should be 
equal to the corresponding value index. In the words of Fisher, "Just as 
each formula should permit the interchange of the two times without giving 
inconsistent results, so it ought to permit interchanging the prices and quan- 
tities without giving inconsistent result, i.e., the two results multiplied to- 
gether should give the true value ratio." In other words, the test is that the 
_ Change in price multiplied by the change in quantity should be equal to the 
total change in value. The total value of a given commodity in a given year 
è EA the product of the quantity and the price per unit (total value=p X 4). 

| 


Poi XPio= 


e ratio of the total value in one year to the total value in the preceding 
year is Pili TF from one year to the next, both price and quantity 


* odo 
Should double, the price relative would be 200, the quantity relative 200, 
- and the value relative 400. The total value in the second year would be 

four times the value in the first year. In other words, if p, and p, repre- 
sent prices and q, and q, the quantities in the current year and the base 
year, respectively, and if Р, represents the change in price in the current 
year and Q, the change in quantity in the current year, then 
? Zpids 
P x0, = as 
01 Qor Zpodo Ы ; А 
If the product is not equal to the value ratio, there is, with reference 
to this test, an error in one or both of the index numbers. 
The factor reversal test is satisfied only by the Fisher's Ideal Index: 
[Гро x ndi 
V родо ^ Epi 
Changing p to q and q to p 
= [Zapo Zapi 
E M XasPo < Хлор: 
[Zpwo , Epwun | Pda Mim ЕМ 
dolia ue p cm Уры ” Z4opo ^ 40р ; 
[Cog = 27191 
у Qpogo? 220 


“Since Pix Qo;— site „ the factor reversal test is satisfied by the Fisher’s ideal index. 
è 


This means, of course, that the formula serves equi 
^quantities as for constructing indices of prices, t i 
interchanging p and 4 in the ideal formula. None of the simple or we 
elementary indices—arithmetic mean, harmonic mean, geometric mean-—ful&l _the 
requirements of factor reversdl test. Itis thus obvious that the strong restrictions 


D 


Proof : Ру= 


= 


Ally well for constructing indices of 
he quantity index being derived by 
ighted forms of 
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imposed by the factor reversal test compel its being ignored in the construction of many 
highly reputable index numbers. 

Some authorities on the subject argue that these are no good logical 
reasons for claiming that an index number ought to meet these tests. For 
example, Karmel has pointed out that as far as time reversal test is con- 
cerned collection of goods included in Ру is different from that included 
in Pio (qa as against q;) and, therefore, one could hardly hope for consis- 
tent results. 


4. Circular Test 


Another test of the adequacy of index number formula is what is 
known as ‘circular test. If in the use of index numbers interest attaches 
not merely to a comparison of two years, but to the measurement of price 
changes over a period of years, it 15 frequently desirable to shift the base. 
А formula is said to meet this test if, for example, the 1970 index with 
1965 as the base is 200, and the 1965 index with 1960 as the base is again 
200, then the 1970 index with 1960 as the base must be 400. In other 
words, we should be able to get a consistent index for 1970 relative to 
1960 by multiplying the 1970 index relative to 1965 by the corresponding 
index for 1965 relative to 1960. Clearly, the desirability of this property 
is that it enables us to adjust the index values from period to period 
without referring each time to the original base. А test of this shifiability 
of base is called the circular test. 

.This test is just an extension of the time reversal test. The test 
requires that if an index is constructed for the year a on base year b, and 
for the year b on base year c, we ought to get the same result as if we 
calculated direct an index for a on base year с without going through b 
as an intermediary. 

* Symbolically if there are three years a, b, c the circular test will be 
satisfied if 
Poy Poy Р, 
spe por e 
The Laspeyres index does not satisfy the test as can be Seen from the 
following : 


If the three years are 0, 1, 2 the index by Laspeyres method will be 
Palo y pai у Уруз 
Polo ` 2191 ^ Lage 
The product of all these is not equal to 1. Hence the test is not satisfied. 
Similarly it can be shown that the Paasche's index and Fisher's index do 
not satisfy the test. However, the simple aggregative method and the 
77а: aggregative method satisfy the test as can be seen from the 
oWing : 


When test is applied to the simple aggregative method we will get 


ХР, XP. SP 
Aro e Pa edes I pesca Heal 
ЎРАБ КЖБ, 


Similarly when applied to fixed weight aggregative method we will 


270 > ey, Bg ү 


Урад ` pg Урьд 
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The circular test (which amounts, in fact, to .a modification of the 

fime reversal test) is met when 
Pra XPa X Pea=1 

An index which satisfies this test has the advantage of reducing the 
computation every time a change in the base year has to be made. Such 
index numbers can be adjusted from year to year without referring each 
time to the original bases. j 

The circular test is not met by the ideal index or by any of the 
weighted aggregative with changing weights. The test is met by simple 
geometric mean of price relatives and the weighted aggregative fixed weights. 
The reason why Laspeyres and Paasche index numbers and their deriva- 
tives, the Marshall-Edgeworth and the Ideal indices do not meet the circular 
test is that the weights in these index numbers depend on the periods 
between which comparisons are being made. If these periods change, 
the weights change. For example, if the base period is taken as period 
Ee than period 0, the weights to Laspeyres index are no longer d; 

ut qs. 

Karmel has pointed out that although it may seem reasonable to 
argue that if a price index between periods 0 and 1 has risen to М and 
between periods 1 and 2 to N, then between periods 0 and 2 it should have 
risen to MN, a moment$ reflection will show that this requirement is 
not reasonable. An index number has meaning only in terms of the sys- 
tem of weighting adopted, and one may produce many numerically. diffe- 
rent but quite valid indices for comparing two periods. The weighting 
system used in Р,, (Laspeyres) is the same as that in Po, (Laspeyres), but 
different from that in Pj». (Laspeyres). Consequently, the increase M is 
an increase in something different from that in which N is the increase. 
The product MN is, therefore, a mixture, the exact meaning of which is 
not clear and which could not be expected to equal a direct comparison 
between periods 0 and 2. 

Wlustration 12, The following figures relate to the prices and quantities of 
certain commodities. Construct an appropriate index number and show ifit satisfies 
the time reversal test. 


Commedities 1967 . 1968 х 
Ргісе Quantities Price Quantities 

Wheat 30 50 32 50 

m s Dou В 
Ва 18 (B. Com., Osmania, 1974) 
Solution : INDEX NUMBER BY FISHER'S IDEAL METHOD P 

“Commodities 1967 1968 

Po qo p 91 pido Pogo — P Pots 
"Wheat 30 50 32 s0 1,600 1,500 1,600 1,500 
Gm 25 40 30 35 1,200 1,000 1,050 875 
+ Barley 18 50 16 55 . 800 900 £ 880 990 


ро Ура — ipu рой: 
23,000 =3,400 =3,530 =3,365 


= [Ene „ Pew x 100= [3,600 3.350 y 199 
Ри = рЫ Зри р, М 3400 7 3365 
=4/ X100-1054x 100—1054 


SM-E—9'77-28. 
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Time reversal test is satisfied when Po: х P19— 
Po= [Ine In 
У 3poqo ” Spot 
Substituting the values of £7149, родо etc. 
pam 43600. 3,530 
ov 3,400 3365 
Pu [57941 ,, родо. [3365 > 3400 
У Хра Tigo У 3,530 3,600 
PorXPip= _,/ 3:600 3.550. 3.365 3.400 yyy, 
Р у 340073365 "3:30 73,00. V. 
Hence time reversal test is satisfied by the above formula. 


Illustration 13, Show with the help of the following data that the Time and 
Factor Reversal Tests are satisfied by Fisher's Ideal Formula for index number cons- 
truction, 


Base year Base year E 7 bros 
Commodities Price Quantity ‘ear Price ‘ear Quantity 
Rs. (ке) Rs. (kg.) 
A 6 50 10 56 
B 2 100 2 120 
С 4 ‚60 69% 60 
D 10 30 12 24 
E 8 40 12 36 
(B. Com., Marathwada, 1974) 
I : COMPUTATIONS FOR TIME REVERSAL TEST AND 
Solution FACTOR REVERSAL TEST 
' Base Base | Current | Current 7 
year year year year 


Commo-| price |quantity| price quantity 
dities | (Rs) | (Rs) | (Rs) | (kg) | P(40 | Polo | Pit | pofi 


po do n 4 

А 6 50 10 56 500 300 60 

в 2 100 2 120 | 200 200 240 240 
C 4 60 6 60 360 240 360 240 
D 10 30 | 12 24 360 300 288 240 
E 8 4| 12 36 480 320 432 288 


Zpido | Bog | Хр19 Урд, 
=1,900 | =1,360 | =1,880 | 1544 


Time reversal test is satisfied when Рој x P1921 


жышын с 
Pu= | 32190 x 20101 and Pyy= / Epot родо 
SD 10 ADT. P00 
rik М родо * Epot М руді Х pigo ^ 
Substituting the values 


PaxPa- [1900 1.880 1,44 . 1.360 
у 71,360 * 1,344 1880 1,999 V 171 
Hence time reversal test is satisfied. 


Factor reversal test is satisfied when Poa oits 
о4о 


Paz Ine. Ina DAS m 
"V Iro * pogo 714 Quom уу заро Map 


н 
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Substituting the values Ро, Оо=  / 1900 . 1,880 1,344 . 1,880 
OE DU MV 71,360. 1,344 * 1,360 71,900 


= PE which is also the value of P 


Hence Fisher's Ideal Index satisfies the factor reversal test. 


THE CHAIN INDEX NUMBERS 


In the fixed base method discussed so far the base remains the same 
throughout the series of the index. This method, though convenient, has 
Certain limitations. As time elapses conditions which were once impor- 
tant become less significant and it becomes more difficult to compare 
accurately present conditions with those of a remote period. New items 
may have to be included and old ones may have to be deleted in order to 
make the index more representative. In such cases it may be desirable 
to use the chain base method. When this method is used the comparisons 
are not made with a fixed base ; rather the base changes from year to 
year. For example, for 1970, 1969 will be the base; for 1971, 1970 will 
be the base, and so on. If, however, it is desired to associate these relatives 
to a common base the results may be chained to obtain chain indices. 
Thus in its simplest form, the chain index is one in which the figures for 
each year (or sub-period thereof) are first expressed as percentages of the 
preceding year. These percentages are then chained together by successive 
multiplication to form a chain index. 

Steps in Constructing а Chain Index 

(i) Express the figures for each year as percentages of the preceding 
year. The results so obtained are called link relatives. 

(ii) Chain together these percentages by successive multiplication to 
form a chain index. Chain index of any year is the average link relative 
of that year multiplied by chain index of previous year divided by 100. In 
the form of formula. 

Average link relative of current year х Chain Index of 


Chain Index= previous year _ 


for current year 100 


The link relatives* obtained in step (i) facilitate comparison from one 
year to another, i.e., between closely situated periods їп which the q's are 
not likely to have changed much. The chain indices obtained in step (ii) 
by a process of chaining binary comparisons facilitate long-term 


- comparisons. Бет es 
Chain relatives differ from fixed-base relatives in computation. 
Chain{relatives are computed from link relatives whereas fixed base 
relatives are computed directly from the original data. The results 
obtained by the two different methods should be the same, but they may 
differ from each other slightly due to rounding of decimal places. Since 
the process of computing chain relatives is quite complicated and the 
resulis are same as the fixed base relatives obtained from the original 


* Link Relative = previous year’s Price 100 
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data, chain relatives should bež used when the original data аге not 
available but the link relatives are. 


Illustration 14, From the following data of the wholesale prices of wheat for 
the ten years construct index numbers taking (a) 1963 as base, and (b) by chain base 


method : 


Year Price of Wheat Year Price of Wheat 
(Rs. per quintal) (Rs. per quintal) 
1963 50 1968 78 
1964 60 1969 82 
1965 62 1970 84 
1966 65 1971 88 
1967 70 1972 90 Ы 
Solution : (а) CONSTRUCTION OF INDEX NUMBERS TAKING 1963 AS BASE. 
Year Price of Index Numbers Year Price af Index Number 
Wheat (1963—100) Wheat (1963100) 
1963 50 100 1968 — 78 7 х100=156 
60 "e 82 2 
1964 60 50 x100—120 1969 82 50 х100= 164 
' 62. ws Т 84 d. 
1965 62 50 х100=124 1970 84 Е] х 100-168 
55. LL ig 
1966 65 50 х 1005130 1971 88 50 x100—176 
Oy 100—= .90 2 
1967 70 30 x100—140 1972 90 50 х100=180 


This means that from 1963 to 1964 there is а 20 per cent increase; from 1963 to 
]965 there is а 24 per cent increase ; from 1963 to 1966 there js a 30 per cent increase. 
Jf we are interested in finding out increase from 1963 to 1964, from 1964 to 1965, from 
1965 to 1966 we shall have to compute the chain indices. 
(b) CONSTRUCTION OF CHAIN INDICES 


Year КИ Price oF Wheat Link Relatives "Chain Indices 1 
in o (96-100) 
1963 50 1000 Ñ 1000 ў 
1964 60 -S x 100=120°00 —120%100 1 
1965 62 f %100=103°33 105334120 agg 
-1966 65 d -x100- 1084 Jor DA. agg ы 
1967 10 5 x100=107°69 97699130140 
1968 78 JS X100=111'43 HIBAI case. 
1969 8 $F x100=105'13 SS cid | 
1970 84 10010244 Теле caes 
1971 88 Ui x100—10476 290762168176 а 
1972 90 m X109-10227 19227517618) 


E E 


Eois 
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Note. The chain indices obtained in (b) above with 1963=100 are the same 
as the fixed number obtained in (а) above. In fact chain index figure will always be 
equal to fixed base index figure if there is only one series. 

Illustration 15. Calculate the fixed base index numbers and chain base index 
numbers from the following data. Are the two results same? If not, why? 


Commodity Price in rupees 


970 1971 1972 1973 1974 
I 2 3 5 7 8 
п 8 10 12 4 18 
ш 4 5 7 9 12 


Solution, Since base year is not specified the first year in order of time, i€., 
1970, is taken as base, As no weights are given the appropriate method for calculating 
niod base index numbers is the price relative method. So for fixed base index numbers 
we have : 


y ртр "Price Relatives 


Commodit. ty 


970 1971 1972 1973 1974 

I 100 150 250 350 400- 

m 100 125 150 50 225 

gu 100 125 И 225 300 

КЕСИ 300 400 SORRENTO 
Average, i.e., 

fixed baie 100 1333 1917 2083 3083 

No. 


CHAIN BASE INDEX NUMBERS CHAINED TO 1970 


| Percentage based on preceding year 


са 1970 1971. 1972 1973 1974 

% | — 100 150 1667 1400 1143 

e i 100 125 1200 1167 1286 

ш 100 125 1400 1286 1333 

Total of Link 300 400 4267 3853 3162 
Relatives 

& Average 100 1333 1422 1284 1254 

Chain indices 100 1333 1896 2434 3052 

(1970—100) 


. . * On comparison we find that except for first two years, the two series of 
index numbers obtained by fixed base and chain base method are different. 
It is because when fixed base and chain base index numbers are computed 
by combining two or more series chain index numbers will be usually 
upper from fixed base index numbers except for the first two given 
ears. ` 

Conversion of Chain Index to Fixed Base Index 

: At times it may be desired to conveft the chain base index numbers 
Into fixed base index numbers. In su case the following procedure is 
followed : Pt dp 


858: 
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(1) For the first year the fixed base index will be taken the same as 
the chain base index. However, if the index numbers are to be constructed 
by taking first year as the base then in that case the index for the first year 
Is taken as 100. 

(2) For calculating the indices for other years the following formula 
fs used : 

's C.B.I. i ar's F.B.I. 
Current year's F.B.L.— Current year's C BIX Previous year’s F.B 

F.B.I.=Fixed based index No. ; C.B.I.=Chain base index No. 

Tilustration 16, From the chain base index numbers given below prepare fixed 
base index numbers : 


1969 1970 1971 1972 1973 
80 110 120 90 140 
Solution : COMPUTATION OF FIXED BASE INDEX NUMBER 
Year Chain base index Fixed base index 
numbers numbers 
1969 80 80 
10x80 _ ,. 
1970 110 100 —7 8800 
120x88 
2i eee јс 
1971 120 100 10560 
90x1056 | os. 
1972 90 о 22:05 04 
*. 140x9504 _.,,. 
1973 140 —109 13310 
Usefulness of the Chain Base Method " 


l. The chain base method has a great significance in practice because 
in economic and business data we are more often concerned with making 
comparisons with the previous period and not with any distant past. The 
link relatives obtained by chain base method serve this purpose. 

2. Chain base method permits the introduction of new commodities 
and the deletion of old ones without necessitating either the recalculation 
of entire series or other drastic changes. Because ofthis flexibility chain 
index is used in many types of indices such as the consumer price index 
and the wholesale price index. 

3. Weights can be adjusted as frequently as possible. This flexibility 
is of great significance in many types of index numbers. 

4. Index numbers calculated by the chain base method are free to a 
greater extent from seasonal variations than those obtained by the other 
method. 

Limitation of the Chain Index 


The limitation of the chain index is that while the percentages of 
previous year figures give accurate comparisons of year-to-year changes, 
the long-range comparisons of chained percentages are not strictly valid. 
However, when the index number user wishes to make year-to-year 
comparisons, as is so often done by the businessman, the percentages of the 
preceding year provide a flexible and useful tool.* 


* Croxton and Cowden : Applied General Statistics. 
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BASE SHIFTING, SPLICING AND DEFLATING THE INDEX NUMBERS 
Base Shifting : 


One of the most frequent operations necessary in the use of index 
numbers is changing the base of an index. Such a change is usually 
referred to as shifting the base. There may be two reasons for this : 


1. The previous base has become too old and is almost useless for 
purposes of comparison. In practice it is desirable that the base period 
chosen for comparison purposes be a period of economic stability which is 
not too far distant in the past. 


2. Comparison is to be made with another series of index numbers 
having a different base. For example, the consumer price index for а 
certain region is available with 1960 as base (i.e., 1960—100). Now 
suppose an investigator wants to compare cost of living changes in the 
community with those of another region for which the corresponding index 
is given with the base year 1965. In such a case it shall be necessary to 
shift the base of the first series from 1960 to 1965. 

When base period is to be changed, one possibility is to recompute 
all index numbers using the new base period. A simpler approximafe 
method is to divide all index numbers for the various years corresponding 
to the old base period by the index number corresponding to the new base 
period, expressing the results as percentages. These results represent the 
€ pen numbers, the index number for the new base period being 

o. 

Mathematically speaking, this method is strictly applicable only if 
the index numbers satisfy the circular test. However, for many types of 
index numbers the method, fortunately, yields results which in practice are 
close enough to those which would be obtained theoretically. 

Illustration 17. Тһе following index numbers of prices (1959—100). 
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Year Index Year Index 
1959 100 1964 410 
1960 110 1965 400 
1961 120 1966 380 
1962 200 1967 370 
1963 400 1968 340 


i t the index numbers. 
Shift the base from 1959 to 1965 and recast the 1 tetti. Mysore, 1972) 


Solution : INDEX NUMBERS WITH 1965 AS BASE (1965=100) 
^ Index Index Numbers Index Index numbers 
Year Numbers (1965—100) Year Numbers (1965—100) 
(1959—100) (1959— 100) 
100 ао 
1959 100 Foo (100-250 1964 410 4007 100—105 
110 100 400 100=100'0 
1960 110 doo «100—275 1965 400 400 
120 380 e 
1961 120 "oo 100—300 1966 380 2007100 95'0 
200 " 370 “жо. 
1962 200 400 «100—500 1967 370 400 100 975 
400 dE. 340 IA 
1963 400 -jog 1090—1000 1968 340 -409 199 850 
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The new series with 1965 as base is obtained very easily by dividing each entry 


of the first column by 400, i.e., the values of the index for 1965 and multiplying the : 


ratio by 100. Thus index number for 1959 
Index number for 1959 . 100 м. 
= Index number for 1965 < 1007 400 *100=25. 


Index number for 1960 
Index number for 1960 х 100— 110 x100—275 


= Index number for 1965 400 
In the similar manner other indices can also be obtained, 
It should be carefully noted that the above method of shifting the 
base will not necessarily coincide with the method in which we start anew 
with the original data and recompute the whole series with the new base. 


It all depends on how the index is constructed and what weights are '4 
being used. Nevertheless, since it is sometimes impossible to do other- : 


wise in practice, the simple method illustrated above is often em loyed 
regardless of whether a complete recomputation of the index would pro- 
duce the identical results.* j 
Splicing 

The problem of combining two or more overlapping series of index 
numbers into one continuous series is called splicing. The need for splicing 
arises for securing continuity in comparison, It happens quite often that 
an index is discontinued because its base has become too old. A new 
index may be started with the same items and some recent year as base. If 
it is desired to connect the new index number with that of one discontinued 
the second index number would be spliced to the first one with the result 
that the index would enable comparison with the old base. The process 
of splicing is very simple and is akin to that used in shifting the base, 
expressed in the form of a formula : 


Index No. of current year xOld Index of new 
Spliced Index No.— — ————— —9 MÉDO NE QA SEN 


Year Index A Index B Year Index A Index B 


1950 100 1959 

1951 110 1960 150 100 
1952 112 1961 120 
= 1962 140 
= 1963 130 
— 1964 150 

Solution YT NA Hj- INDEX B SPLICED TO INDEX А 
Year Index A Index B mec me y 
1950 as base 

1950 100 

1951 110 

1952 112 

1959 138 

1960 150 100 -150 x 100—150 

s 100 


* Freund and Williams : Modern Business Statistics. d 


N 
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150 Е, 
1961 120 10072120180, 
150 
1962 140 109 X 140-210 
150 a 
1963 130 -iog 130195 
150 nr 
1964 150 -ioy (1507225 


The spliced index now refers to 1950 as base and we сап make а continuous 


comparison of index numbers from 1950 onwards. 
In the above case it is also possible to splice the new index in such a 
manner that a comparison could be made with 1960 as base. This would 


be done by multiplying the old index by the ratio . Thus the spliced 
у 100 К 100 L3 
index for 1950 would be 150 x100=667, for 1951, 150 x 110 =73°3, 


for 1952. 100 х112 = 747,etc. This process appears to be more useful 


' 150 
because a recent year can be kept as a base. However, much would depend 


upon the object. 


It shall be clear from above that splicing is very useful for enabling 
However, it should be 


where geometric mean 
because in such a case 
of difficulties of com- 


index numbers are reversible. However, because 
ed in constructing index 


putations, the geometric mean is not very often us 
numbers. 
"Use of Index Numbers in Deflating 


purchasing power) of a rupee is simply the reciprocal of an appropriate 
price index written as а proportion. If prices increase by 60 i 
index is 1:60 and what a rupee will buy is only 1/1°60 or 5/8 of what it 
used to buy. In other words, the purchasing power of rupee is 5/8 of 
what it was or approximately 63 paise. Similarly, if prices increase by 25 
per cent, the price index is 1:25 (125 рег cent), and the purchasing power 
of the rupee is 1/1:25—0:80—80 paise. 

It shall be clear from above that since the value of money goes down 
with rising prices the workers or the salaried people are interested not so 
much in money wages as in real wages, i.e., not how much they earn but 
how much their income or wage will buy. 

For calculating real wages we сап multiply money wages by a 
quantity measuring the purchasing power of the rupee, or better we divide 
the cash wages by an appropriate price index. This process is referred to 
as deflating. In principle it appears to be very simple but in practice the 


. 
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main difficulty consists in finding appropriate index to deflate a given set of 

values or appropriate deflators. The process of deflating can be expressed 

in the form of formula as follows : 

Money wage. 
Price index 


Illustration 19, Following table gives the annual wages of a worker together 


with the Price Index Numbers, Compute the Index Numbers of Real Income and 
interpret them : 


Real wage— 


Year Wages Price Index Year Wages Price Index 
1968 200 100 1972 360 300 
1969 240 160 1973 370 320 
1970 350 280 1974 375 330 
1971 360 290 


(B. Com., Lucknow, 1974) 


Бошоп: INDEX NUMBER OF REAL WAGES Na: Nol 
Year Wages Price Index No. Real Real wage 


(in Rs.) wages Indices 
(1968=100) 
1968 200 100 200 -X100—200 10070 
1969 240 160 0. 5<100=150 75°0 
350 th 
1970 350 280 550-100-125 625 
1971 360 290 360 -x 100=124 620 
360 
1972 = =12 . 
7 360 300 Soo X100—120 6070 
Sere : 
1973 370 320 330 x100-116 5&0 
375 à 
1974 375 330 x 100=114 570 


The index number of real wages has fallen from 100 in 196810 57in 1974. In 
other words, despite the fact that the money wage has increased from Rs. 200 in 1968 
to Rs. 375 in 1974, the Worker is not better off. 


The method discussed above is frequently used to deflate individ ual 
values, value Series or value indices. Its special use is in problems dealing 


Meaning and Need 


The consumer price index numbers also known as cost of living index 
numbers are generally intended to Tepresent the average change over time 
in the Prices paid by the ultimate consumer ofa specified basket of goods 
and services. The need for constructing consumer Price indices arises because 
the general index numbers fail to give an exact idea of the effect of the 
change in the general price level on the Cost of living of different classes of 


example, the consumption pattern of rich, Poor and middle class people 
varies widely. Not only this, the consumption habits of the people 
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of the same class differ from place to place. For example, the 
mode of expenditure of a lower division clerk living in Delhi may differ 
widely from that of another clerk of the same category living in, say, 
Madras. The consumer price index helps us in determining the effect of 
rise and fall in prices on different classes of consumers living in different 
areas. The construction of such an index is of great significance because 
very often the demand for a higher wage is based on the cost of living 
index and the wages and salaries in most countries are adjusted in ассог- 
dance with the consumer price index. 

It should be carefully noted that the cost of living index does not 
measure the actual cost of living nor the fluctuations in the cost of living. 
due to causes other than the change in the price level ; its object is to find 
out how much the consumers of a particular class have to pay more for 
a certain basketful of goods and services in a given period compared to 
the base period. To bring out clearly this fact, the Sixth International 
Conference of Labour Statisticians recommended that “the term ‘cost of 
living index’ should be replaced in appropriate circumstances by the terms 
“ргїсе of living index', ‘cost of living price index’, or “consumer price index’? 
At present, the three terms, namely, cost of living index, consumer price 
index and retail price index are in use in different countries with practically 
no difference in their connotation. 

It should be clearly understood at the very outset that two different 
indices representing two different geographical areas cannot be used to 
compare actual living costs of the two areas. A higher index for one area 
than for another with the same period is no indication that living costs are 
higher in the one than in the other. All it means is that as compared with 
the base periods, prices have risen in one area than in another. But actual 
costs depend not only on the rise in prices as compared with the base 
period, but also on the actual cost of living for the base period which will 
vary for different regions and for different classes of population. 

Utility of the Consumer Price Indices 

The Consumer Price Indices are of great significance as can be seen 
from the following : 

(1) The most common use ofthese indices is in wage negotiations 
and wage contracts. Automatic adjustments of wage or dearness allowance 
component of wages are governed in many countries by such indices. 

(2) At Governmental level, the index numbers are used for wage 
policy, price policy, rent control, taxation and general economic policies. 

(3) The index numbers are also used to measure changing purchasing 
power of the currency, real income, etc. 

; (4) Index numbers are also used for analysing markets for particular 
kinds of goods and services. 


Construction of a Consumer Price Index 
The following are the steps in constructing a consumer price index = 
(1) Decision about the class of people for whom the index is meant, 
It is absolutely essential to decide clearly the class of people for whom the 
index is meant, i.e., whether it relates to industrial workers, teachers, 
officers, etc. The scope of the index must be clearly defined. For example, 


if we are asked to compute real wages, the real wage indices need not be 
calculated. 
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when we talk of teachers, we are referring to primary teachers, middle 
class teachers, etc., or to all the teachers taken together. Along with the 
class of people it is also necessary to decide the geographical area covered 
by the index. Thus in the example taken above it is to be decided whether 
all the teachers living in Delhi are to be included or those living in a 
particular locality of Delhi, say, Chandni Chowk area, Karol Bagh, etc. 


The enquiry is conducted on a random basis, By applying lottery 
method some families are selected from the total number and their family 
budgets are scrutinized in detail. The items on which the moncy is spent 
are classified into certain well-accepted Broups, namely :— 

(i) Food. 

(i) Clothing. 

(iii) Fuel and Lighting. 
(iv) House Rent. 
(v) Miscellaneous. 


Each of these groups is further divided into sub-groups. For example, 
the broad group ‘food’ may be divided into wheat, rice, pulses, sugar, etc. 
The commodities included are those which are generally consumed by 
people for whom the index is meant. Through family budget enquiry an 
average budget is prepared which is the standard budget for that class 


‘cause such prices may vary from place to place, shop to shop and person 
to person. Price quotations should be obtained from the localities in 
which the class of people concerned reside or from where they usually make 
‘their purchases. Some of the principles recommended to be observed 


in the collection of retail price data Tequired for purposes of construction 
of cost of living indices are described below : 


(а) Тһе retail prices should relate to a fixed list of items and for 
each item, the quality should be fixed by means of suitable specifications. 


(b) Retail prices should be those actually charged to consumer; for 
cash sales. * 


INDEX NUMBERS E- 338 


. (0) Discount should be taken into account if it is automatically, 
given to all customers. I 
(d) In a period of price control or rationing, where illegal prices are 
charged openly, such prices should be taken into account along with the: 
controlled prices. 


The most difficult problem in practice is to follow principle (a), i.e., 
the problem of keeping the weights assigned and qualities of the basket 
of goods and services constant with a view to ensuring that only the effect 
of price change is measured. To conform to uniform qualities, the accep- 
ted method is to draw up detailed descriptions or specifications of the 
items priced for the use of persons furnishing or collecting the price: 
quotations. 


А Since prices form the most important component of cost of 
living. indices, considerable attention has to be paid to the methods 
of price collection and to the price collection personnel. Prices 
are collected usually by special agents or through mailed question- 
naire or in some cases through published price lists. The greatest 
reliance сап be placed on the price collection through special 
agents as they visit the retail outlets selected and collect the prices from 
them. However, these agents should be properly selected and trained and 
should be given a manual of instructions as well as manual of specifications, 
of items to be priced. Appropriate methods of price verification should 
be followed such as ‘check pricing’ in which price quotations are verified by 
means of duplicate prices obtained by different agents or ‘purchase check- 
ing’ in which actual purchases of goods are made. 


After quotations have been collected from all retail outlets an ave- 
rage price for each of the items included in the index has: to be worked 
out. Such averages are first calculated for the base period of the 
index and later every month if the index is maintained on 
a monthly basis. The method of averaging the quotations should 
be such as to yield unbiased estimates of average prices as being 
paid by the group as a Whole. This, of course, will depend upon the 
method of selection of retail outlets and also the scope of the index. 


In order to convert the prices into index numbers the prices or their 
relatives must be weighted. The need for weighting arises because the 
relative importance of various items for different classes of people is not the. 
same. For this reason, the cost of living index is always a weighted index. 
While conducting the family budget enquiry the amount spent on each 
commodity by an average family is decided and these constitute the weights. 
Percentages of expenditure on the different items constitute the ‘individual 
weights’ allocated to the corresponding price relative and the percentage 
expenditure on the five groups constitute the ‘group weight’. 

Method of Constructing the Index 

After the above-mentioned problems are carefully decided the index 
may be constructed by applying any of the following methods : 

(1) Aggregate Expenditure Method or Aggregative Method. 

(2) Family Budget Method or The Method of Weighted Relatives. 


]. Aggregate Expenditure Method. When this method is applied 
the quantities of commodities consumed by the particular group in the 
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base year are estimated which constitute the weights. The prices of com- 
modities for various groups for the current year are multiplied by the 
quantities consumed in the base year and the aggregate expenditure in- 
curred in buying those commodities is obtained. In a similar manner the + 
prices of the base year are multiplied by the quantities of the base year and | 
aggregate expenditure for the base period is obtained. The aggregate 
expenditure of the current year is divided by the aggregate expenditure of 
the base year and the quotient is multiplied by 100. Symbolically, 


Consumer Price Index= Be x100 


. Poo + 
This is in fact the Laspeyres method discussed earlier. This method ` 
is the most popular method for constructing consumer price index. 7, 


2. Family Budget Method. When this method is applied the family 
budgets ofa large number of people for whom the index is meant 
are carefully studied and the aggregate expenditure of an average 
family on various items is estimated. These constitute the weights. | 
The weights are thus the value weights obtained by multiplying the prices 
by quantities consumed (ї.е., ро). The price relatives for each commo- 
dity are obtained and these price relatives are multiplied by the value | 
weights for each item and the product is divided by the sum of the weights. 
Symbolically, 1 

Consumer Price Index JA». 
where, pe A X100 for each item; V=Value weights, i.e., руд. 
s o 

This method is the same as the weighted average of price relatives 

method discussed earlier, 


It should be noted that the answer obtained by applying the aggregate 
expenditure method and the family budget method shall be the same*. 


Iilustration 20, Construct the consumer price index number for 1970 оп the 
basis of 1969 from the following data using (i) the aggregate expenditure method, and 
(ii) the family budget method. 


Article Quantity consumed Units Price in Price in 
in 1969 969 1970 
Rs. Paise Кз, Paise 
Rice 6 Quintal Quintal 5 75 6 0 
Wheat 6 , » 5 0 8 0 
Gram bo as m 6 0 9 0 
Агһаг $. „ » 8 0 10 0 
Ghec 4kg. Kg. 2 0 1 50 
Sugar 1 Quintal Quintal 20 0 15 0 


* The denominator and numerator i 
e Donum oe in both the methods are the same as can be. 
£14 of the Laspeyres method is the same as ЕРУ of the family budget method 


oe 


з Py- P Poo which is nothing but руд. 
The denominator in both the methods is also the same, 


. 


А Zpoqo—XV. 
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Solution: COMPUTATION ОЕ CONSUMER PRICE INDEX NUMBER FOR 1970 
(Base 1969-100) BY THE AGGREGATE EXPENDITURE METHOD _ 


Article Quantities Unit Prices in Prices in 
consumed 1969 1970 
90 Po Pı P10 Pod 
Rice 6 Qu. Qu, 3°75 600 36'00 34'50 
Wheat 6, e 5'00 8°00 48'00 30'00 
Gram Los; 55 600 9°00 9°00 600 
Arhar Cis; 800 10:00 60°00 48°00 
Ghee 4 Kg. Kg. 2700 1'50 600 8'00 
Sugar 1 Qu. Qu. 20°00 1500 15:00 20°00 
®р\до 000 + 
=174 714650 


i — pio 174 Ean 
Consumer Price Index = oda x 100= 1465 х100=118`8 


CONSTRUCTION OF CONSUMER PRICE INDEX NUMBER FOR 1970 
(Base 1969—100) BY THE FAMILY BUDGET METHOD 


Article Quantity Unit Price in Price in 
1969 9 


consumed 196 а x100 pogo РИ 
ao p n. P V 

Rice 6 Qu Qi $75 60 10434 35 3600 
Wheat бё n 5'00 8'0 160°00 300 4,800 
Gram 1.55 » 6°00 90 15000 60 900 
Arhar 6, » 8'00 10'0 125'00 480 — 6,000 
Ghee 4 Kg Kg. 200 r5 75°00 80 600 
Sugar 1 Qul. Qu. 20°00 15'0 75°00 200 1,500 
Fe ЫИ „ч. ИГРА ХРИ 
а m азн 1465 17,400 

Ж ZPV _ 17,400 _ 149. 
Consumer Price Index === 1165 711877 


Thus, the answer is the same by both the methods. However, the reader should” 
prefer the aggregate expenditure method because it is far more easier to apply compared 
to the family budget method. В 

Illustration 21, Construct а cost of living index from the following indices, the 
weights being des vi clothing 15, rent 20, me and Ms 15 and miscellaneous 5. 

001 ing еі 


Үеағ Rent Cloth and Miscellaneous 
lighting 
1968 100 100 100 100 100 
1969 105 104 98 100 110 
1970 110 112 102 101 115 
1971 112 115 105 103 120 
Solution ; CONSTRUCTION OF COST OF LIVING INDEX NUMBERS 
1969 1970 1971 
Items з z E 2 $ zs 3 Е 
z M SE Р ŽS 
E s ез bj xS 28 
ГЕ si|s3 Е RES 
1 55 105 5,775 110 6,050 6,160 
2. hen 20 104 | 2,080) 112 240 
3. Clothing 15 98 1470| 102 1,530 1,575 
4. Fuel and Lighting 15 100 1,500 101 1,515 1,545 
5. Miscellaneous 5 110 550} 115 575 600 
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Cost of Living Index for 196911375 =1034 
"MISI eis 

me) 190—010 без 
121180: 

ои лопа пот 


Illustration 22. An enquiry into the budgets of middle class families in a city in 
India gave the following information : 


Expense on Food Rent Clothing Fuel Miscellaneous 
35% 15% 2 10% 20% 

Prices (1968) Rs. 150 — Rs.30 Rs. 75 Rs. 25 Rs. 40 

Prices (1969) Rs.145 — Rs. 30 Rs. 65 Rs. 23 Rs. 45 


What changes in the cost of living figures of 1969 as compared with 1968 are 

seen ? (B. Com., Punjab, 1973) 
Solution : CONSTRUCTION OF COST OF LIVING INDEX NUMBER FOR 

i 1969 WITH 1968 AS THE BASE 


Price 
Price in Rs. Relatives 
Items of y P taking — 
Expenditure 1968—100 | 
RAN x100 
1968 1969 Po 
Po pm w es 
Food 150 | 145 96:67 35 33833 
Rent 30 30 1000 15 150070 
lothing 75 | 65 867 20 17340 
rel 25 23 920 10 92070 
; Miscellaneous 40 | 45 1125 20 225070 
EW-100 | SPW=97873 
SPW _ 97873 Жүз тысе: 


Cost of Living Index— УЙ —- 100 9787 


Thus a fall of 21375 has taken place in th ivi i Ss 
families in the given city of Índia in 1969 m боштой weh. 1968. EE cM че 


Illustration 23. Construct the cost of living index number from the table given. 


below : a Index f. 
roup Index for 1966 xpenditure 
1. Food. 550 giro 
2. Clothing 215 1092 
3. Fueland Lighting 220 2 
4. House Rent 150 12% 
5. Miscellaneous 275 74 


25% 
Бош! (M. Com., Delhi, 1972) 
on: CONSTRUCTION OF COST OF LIVING INDEX NUMBER 


Group Index Number Expenditure 
1 y IV 
Food 550 4 = 
Clothing _ NEG 10 25180 
Fuel and Lighting 220 7 1,540 
House Rent 150 12 1,800 
Miscellaneous 275 25 6,875 


IV-i0 —— XI-31665 


zd году: 37,665 : 
ost of Living шік ^y = No 37665. 
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Precautions while using Consumer Price Index 

_ Quite often the consumer price indices are misinterpreted. Hence 
while using these indices the following points should be kept in mind : 
d 1. As pointed out earlier the consumer price index measures changes 
in the retail prices only in the given period compared to base period—it 
does not tell us anything about variations in living standard at two 
different places. Thus if the. cost of living index for working class for 
Bombay is 175 and for Delhi 150 for the same period and for the same 
class of people it does not necessarily mean that living costs are higher in 
Bombay compared to Delhi. 

2. While constructing the index it is assumed that the basket does 
not change. However, as one moves away from the base, particularly in 
situations of shortages, the basket itself undergoes a change and working 
out changes in the cost of buying the old basket may become unrealistic. 
But this is a difficult task. The Sixth International Conference of Labour 
Statisticians recommended that the pattern of consumption should be 
examined and the weights adjusted, if necessary, at intervals of not more 
than ten years to correspond changes in the consumption pattern. The 
index also does not take into account changes in qualities. Unlike changes 
in consumption pattern changes in qualities of goods and services are more 
frequent and when a marked change in the quality of items occurs 
appropriate adjustments should be made to ensure that the index takes into 
account changes in qualities also. But in practice itis a difficut proposi- 
tion to follow, and, therefore, constant qualities are assumed at two 
different dates which again is a shaky assumption. К 

3. Like any other index the consumer price index is based on a 
sample. While constructing the index sampling is used at every stage— 
in the selection. of commodities, in obtaining price quotations, selecting 
families for family budget enquiry, etc. The accuracy of the index thus 
hinges upon the use of sampling methods. The consumption pattern 
derived from the expenditure data of a sample of households covered in 
the course of family budget enquiry has to be representative of all the items 
in the average budget, the localities from which price data are collected 
have to be representative of all localities from which the population group 
makes purchases, the retail outlets from which prices are collected have 
to be representative of all the retail outlets patronised by the population 
group, etc. However, it is often difficult to ensure perfect representative- 
ness and in the absence of this the index may fail to provide the real 
picture. 

INDEX NUMBER OF INDUSTRIAL PRODUCTION 

The index number of industrial production is designed to measure 
increase or decrease in the level of industrial production in a given period 
compared to some base period. It should be noted that such an index 
measures changes in the quantum of production and not in values. For 
constructing such an index it is necessary to obtain data about the level 
of industrial output in the base period and the given period. Usually 
data about production are collected under the following heads : 

1. Textile Industries—cotton, woollen, silk, etc. 

2. Mining Industries—iron ore, coal, copper, petroleum, etc. 

3. Metallurgical Industries—iron and steel, etc. 


SM-E—9'77-29 
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4. Mechanical Industries—locomotives, ships, aeroplanes, etc. 

5. Industries subject to excise duties—sugar, tobacco, match, etc. 

6. Miscellaneous—glass, soap, chemical, cement, etc. 

The figures of output for the various industries classified above are 
obtained on a monthly, quarterly or yearly basis. Weights are assigned 
to various industries on the basis of some criteria such as capital invested, 
turnover, net output, production, etc. Usually the weights in the index 
are based on the values of net output of different industries. The index of 
industrial production is obtained by taking the simple arithmetic mean or 
geometric mean of the relatives. When simple arithmetic mean is used the 
formula for constructing the index becomes : 


x 2v 
Index of Industrial ошен и бәз 


where q:—quantity produced in the given period 
ФЕН ds » „ base „ 
W —relative importance of different outputs, 
For determining the relative share of an individual output to total 
output the concept of value added is most commonly used. 
Illustration 24, Construct the index number of business activity in India from 


the following data : 
Use (a) Arithmetic mean, and (5) Geometric mean. 
Items Weightage Index 
1. Industrial production 36 250 
2. Mineral production 4 135 
^ 3. Internal trade 24 200 
4. Financial activity 20 135 
5. Exports and imports 7 325 
6. Shipping activity 6 


300 
(B. Com., Gwaliar, 1976) 
Solation: CONSTRUCTION OF INDEX NUMBER OF BUSINESS ACTIVITY 


ch Мо, Items Weightage | Index IW. 
M I 
T; Industrial production 36 250 9,000 
2: Mineral production 7 135 945 
3. Internal trade 24 200 4,800 
2A. Financial activity 20 135 2,700 
b Exports and imports * 325 2,275 
5%. Shipping activity 6 300 1,800 
$$ 
ZW-100 XIW-21,520 
Peale tier aa Labo CA Fey та ла 
Index No. of business Betta p = 00 —2152 
INDEX OF BUSINESS ACTIVITY USING GEOMETRIC MEAN 
Items fades Log I Ww Log IXW 
Industrial production 250 23979 36 863244 
Mineral production 135 2:1303 7 149121 
Internal trade 200 23010 24 55/2240 
Financial activity 135 21303 20 4276060 
Exports and imports 1325 25119 7 175833 
Shipping activity 300 24771 6 14°8626 


2W=100 Х log Ix W=231°5124 
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2315124 
Index No.—AL 100 
The second Index is better. 

MISCELLANEOUS ILLUSTRATIONS 
Illustration 25. The price quotations of four different commodities for 1972 and 


1973 are given below. Calculate the index number for 1973 with 1972 as base by using 
(i) the simple average of price relatives, and ( /ї) the weighted average of price relatives, + 


=AL 2'°3151=206°5 


Commodities Unit Weight Price in (Rs.) 
(Rs.'000) 1972 1973 
A Kg. 5 200 450 
B Quintal 7 2:50 3:20 
с Dozen 6 3:00 450 
р Kg. 2 100 1`80 
(.C.W. A., 1974) 


Solution : INDICES BY USING (i) SIMPLE AVERAGE OF PRICE RELATIVES 
Qi) WEIGHTED AVERAGE ОЕ PRICE RELATIVES 


Commodities Unit Weight Po Рі pi 
(Rs. *000) Di x100 PV 
V Р. 
А Kg. 5 200 450 225 1,125 
B Quintal 7 250 320 128 8 
c Dozen 6 300 450 150 900 
D Kg. 2 гоо  r80 180 |, 360 
XEV-—20 oa X100—683 — ZPV—3,281 


ELCR, F 
7170775 


(ii) Weighted average of price relatives method : 


(i) Simple average of price relatives Pg1— 


XPVO 3281 d 
Po= Ww = 20 == 164'05 
Illustration 26, Given the following data : 

Commodity ' Base year data Current year data 
Po qo Pl Ф 

А 1 10 1°5 8 

в 5 12 60 10 

С 8 5 1070 2 


Demonstrate the computation of price index number for current year using 
(0) Laspeyres' formula, (ii) Paasche's formula, (ii) Weighted arithmetic average of 


price relatives, weights being values in the base year. M. Com., Delhi, 1973) 
' Solution ; CALCULATION OF PRICE INDICES 
Commodity. ро go pi а pido Pogo — in роді 
А 1 10 rs 8 15 10 12 8 
B 5) 12 60 10 72 60 60 50 
С 8 5 1076 2 50 40 20 16 
d pgo Ур Уру роді 
187 "uoc =92 =74 - 
Эта, 137 y 109124555 


(i) Laspeyres’ method : Po1— Spi 100— 110 
A x 100— S Х100= 12432 
(iii) Weighted arithmetic mean ies А relatives, weights being values in the base 
year 


(ii) Paasche's method: Ро 
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Commodity р ù m а = x100 Podo PY 
y E 
Р РИСТЕ саса рг" 10 1,500 
В 5 12 $0 10 120 60 7,200 
c 8 3v Da, Poi a АУ 75000 7 
XV ZPV 
=110 =13,700 
ХРИ 13700... 
Р, a=- = 0 124 54 
Шиѕітаќіоп 27. Construct (a) fixed base, and (6) chain base index numbers 
from the following data relating to production of electricity. 
Years Production Years Production Years Production 
("000 Кул.) (000 Kwt.) (*000 Кул.) 
1951 25 1957 31 1963 37 
1952 27 1958 35 1964 38 
1953 30 1959 40 1965 39 
1954 24 1960 4l 1966 40 
1955 28 1961 36 
1956 29 1962 32 


(*000 Kwt.) Number ("000 Kwt.) Number 
(951—100) (1951100) 
1951 25 =100 1959 до 9 109—160 
25 
27 " AL a 
1952 27 -Z x100=108 ^ 1960 41 St x100—164 
1953 0 2 x100=120 196 36 35. x100=144 
24 a 32 d 
1954 24 ZE х100—96 1962 2 32 x100=128 
нс DIRAS 
1955 28 Se x100=112 — 1963 37 37. x100=148 
{ 29 i 38 E 
1956 29 $2 x100=116 196 38 38 x 100=152 
31. 2 39 $ 
1957 31 35 X100=124 1965 39 = x100=156 
SP ae dO ER 
1958 35 25 X100—140 1966 40 99 x100=160 
CALCULATION OF CHAIN BASE INDICES 
Years Production Link , Chai ў 
(000 Kwt.) Relatives ee indicet 
1951 25 —100 —100 
27 
1952 27 3x 100=10800 ion x100—108 
30 t uru 
1953 30 30 y199— dH. equ 
29 иги 100 Х108=120 
A. ү, 80 
1954 24 Sg X100— 80:00 =H x120=96 
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1955 28 AP X100- 11667 uae x 96 =112 
1956 29 29-10-1057 WSS? 12-16 
1957 31 E x 100=106'90 1069 хп6=124 
1958 35 Ai-x100-11290 122 x124—140 
1959 40 30100-11429 ШЕ x140—160 
1960 41 Æ x100=102'50 10230 x160—164 
1961 36 36 <100= 87°80 878 164—144 
1962 32 Lm 100— 88°99 8859x 144—128 
1963 37 Sh x 100=115'62 15562, 178—148 
1964 38 35 x100—10270 1027 x148—152 
1965 39 iy X100 10265 10263 y152—156 
1966 40 30 x100—10756 S x156—160 


Note: The above calculations show that the chain indices are the same as the 


fixed base index number. In case of one series this will always be so. 
illustration 28, From the data given below construct an index number of the 


group of four commodities by using Fisher’s Ideal Formula : 


| Base year Current year 
Commodities | x я 
| Price per Expenditure Price per Expenditure 
unit | (Rs.) unit (Rs.) 

1 2 40 5 75 

2 4 16 8 40 

3 1 10 2 24 

4 5 | 25 10 60 


^ PEEL ag — B. Com., Bombay, 1973) ` 
Solution. Since we are given the price and expenditure we can obtain the 
uantities consumed by dividing expenditure for each commodity by the respective price 


and then apply the Fisher's method. e 
Price Price y 
Commo-| per Quantity, рет Quantit, 
dities | unit unit 
Po do n а Pid родо P101 Pod 
1 2 20 5 15 100 40 75 30 
2 4 4 8 5 32 16 40 20 
3 1 10 2 12 20 10 24 12 
4 5 5 10 6 50 25 60 30 
Ур14о | Хродо | Vit | Bon 
=202 | =91 =199 = 
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Applying Fisher's Method: Ро / Х?10 , Emi x100 
У Хродо ^ Xpodi 
; / 202 109 ; 
t : = —7 x100 
Substituting the values A. 31 x 32 
= 4/803 X100—2:192 x 100=219'1. 

Illustration 29, The average of wholesale prices was higher in 1957 than in 1956 
by 15°1 per cent, the index numbers for the two years being 1087 and 944 respectively 
(1950—100). This increase followed rise of 61, 1'0 and 2'8 per cent, each year being 
Compared with the preceding. In 1953, prices were the same as in 1952, but 2'5 per 
cent below 1951. Prices in 1951 were 122 percent below 1950. From these data, 


compute index numbers for each year from 1950 to 1957. (M. Com., Agra, 1969) 
Solution: у ЖҮ I. A 
Year z Index No, Year Index No. — 
1950 100 1955. xes 889 
1951 (100—12°2) 878 1956 1061889 943 
9T5x8T8 r 1151 x943 T 
1952 Serr eae 856 1957 — 400 108'5 
1953 — 856 
102'8 х85'6 ; 
1954 7—00 880 


Illustration 30. The following table gives the annual income of a teacher and 


the general index number of price during 1961-68. Prepare the index number to show 
the changes in the real income of the teacher. 


Year Income Price Year Income Price 
(Rs.) Index No. (Rs.) Index No. 

1960 360 100 1965 640 290 

1961 420 104 1966 680 300 

1962 500 115 1967 720 320 

1963 550 160 1968 750 330 

1964 600 280 (B. Com., Punjab, 1972) 

Solution: INDEX NUMBER SHOWING CHANGES IN THE REAL INCOME 

B o OF THE TEACHER 

Year Income Price Real income Real income 
(Rs.) Index Nos, Index Nos. 

1960 360 100 $60 x 100=360'00 10000 
1961 420 104 320 x 100=403'85 11218 
1962 500 115 2. x100—43478 12077 
1963 550 160 50. X100—343:75 9549 
1964 600 280 зю x100—214:29 59:52 
1965 640 290 Ean x100=220:69 61:30 
1966 680° · 300 50 10022667 62:96 
1967 720 320 20-10-2250 62:50 
1968 750 330 150. x100-22727 6313 


ФОРМЕ" 


E-1348 


INDEX NUMBERS 
Illustration 31. Construct the Cost of Living Index Number for the year 1966 
from the following data : 
1963 1966 
Commodity Price Quantity consumed Price 
(Po) (4) (рп) 
А 25 160 35 
B 36 TO 48 
c 12 35 16 
D 6 25 10 
Е 28 40 28 
(M. Com., Delhi, 1968) 
Solution : CONSTRUCTION OF COST OF LIVING INDEX 
Commodity ` Po 4o Pn райо Радо 
А 25 160 35 560 400 
B 36 TO 48 336 252 
[^] 12 35 16 56 42 
D 6 25 10 25 15 
E 28 40 28 112 Е 112 
Zpndo p 
бү =1089 =821 
n _ EPndo _ 1089 SOLES 
Cost of Living Index— Уруй X100— -571 х100=132'6 


Illustration 32, Та 1968 for working class people wheat was selling at an aver- 
age price of Rs. 16 per 20 kg., cloth at Rs. 2 per metre, house rent Rs. 30 per house and 
other items at Rs. 10 per unit. By 1971 cost of wheat rose by Rs. 4 per 20 kg., house 
rent by Rs. 15 per house and other items doubled in price. The working class cost of 
living index for the year 1971 (with 1968 as base) was 160. By how much the cloth rose 
in price during the period 1968-71 ? 

Solution, Let the rise in price of cloth be X. 

INDEX NUMBER FOR 1971 


1968 1971 
Commodity Price Index No. . Price NES dndex No. Бар 7 
Wheat 16 100 20 Ax 100—125 
Cloth 2 100 х t x100—50X 
House Rent 30 100 45 5. x100—150 
Miscellaneous 10 100 20 x x 100—200 
415+50¥ 


The index for 1971 as given is 160. Therefore, the sum of the index number of 
the four commodities would be 160x4—640. 
Hence 475+50X=640 


50X=640—475=165 or = 2&8. —55. 


Hence the rise in the price of cloth was Rs. 33 per metre.] 

flustration 33. Owing to change in prices the consumer price index of the 
working class in а certain area rose ina month by one quarter of what it was before to 
225. The index of food became 252 from 198, that of clothing from 185 to 205, that of 
fuel and lighting from 175 to 195 and that of miscellaneous from 138 t0 212. The index 
of rent, however, remained unchanged at 150. It was known that the weight of clothing, 


rent and fuel and lighting were thesame. Find out the exact weight of all the groups. 
(M. Com., Delhi, 1977) 
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Solution, Suppose the weights of the group are as follows : 
Food X Rent Z 
Fuel and Lighting Z Clothing Z 
Miscellaneous Y 

Therefore, the weighted index in the beginning of the month was : 


Index Weights IW 
I Ww 
Food 198 X 198X 
Clothing 185 2 1852 
Fuel and Lighting 175 4: 1752 
Кепі 150 2 1502 
Miscellaneous 138 КЕ: EEUU 
Zzw- ZIW-— 
"s F X-Y-3Z — 198X--i38Y-45102 
—._ 198X--138Y--510Z 
Index number= FYZ 
Similarly the weighted index at the end of the month was : et 
Index Weights IW 
I Ww 
Food 252 x 252X 
Clothing 205 Z 205Z 
Fuel and Lighting 195 7 1952 
Rent 150 2 1502 
Miscellaneous 212 TY: 212Y 
2W= Xnw- 


X+Y+3Z 252X--212Y--5s0Z 


252X4-212Y--550Z 
Х+Ү+37 
The weighted index at the end of the month was 225 (given). This index is a 


mne ADU first index by one quarter, Therefore, the index at the beginning was 4/5 


Hence the weighted index at the beginning of the month was t 
189 198X-- 138Y--510Z 
Х+Ү+32 
180X--180Y--540Z —198X--138Y--510Z 
or 18X—42Y—30Z=0 (1) 
Similarly the weighted index at the end of the month was— 
225—.252X--212Y--550Z 


Index number= 


X+-Y+3Z 
225X+225Y+675Z=252X+212Y+550Z 
or 171X—13Y—125Z=0 0) 
Let (ће total weight be equal to 100 
Hence X+Y+3Z=100 (3) 
By solving the Eqns. (1), (2), (3), we get the required answer 
18X+42Y—30Z=0 (1) 
X+Y+3Z=100 ...(3) 


Multiplying Eqn. (3) by 18: 18X—42Y—30Z —0 
18X--18Y3-54Z —1800 


—60Y—84Z ——1800 
or 60Y--84Z —1800 (4) 
27Х—13Ү—1257=0 (2) 
Х+Ү+32=100 (3) 
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Multiplying Eqn. (2) by 27: 27X—13Y—125Z2—0 
27X+27Y+ 812 —2700 


—40Y —206Z —2700 
or 40Y--206Z —2700 (5) 
From Eqns. (4) and (5) 60Y+84Z=1800 (4) 
40Y--206Z —2700 (5) 


Multiplying Eqn. (4) by 20 and Eqn. (5) by 30 
1200Y--1680Z —36000 
1200¥+6180Z=81000 


—4500Z=—45000 or Z=10 


Substituting the value of Z in Eqn. (4) 
60Y4- (84 x 10) =1800 
60¥=1800—840=960 .'. Y—16 


Substituting the value of Y and Z in Eqn. (3) 
+164 (310) —100 
*. Y-100—16—30—54 


Thus the exact weights are : 


Food 54 
Clothing 10 
Fuel and Lighting 10 
Rent 10 
Miscellaneous 16 
Illustration 34, Calculate the Cost of Living Index from the following data : 
Items Quantity consumed Price per unit in rupees 
per year in the Base year Given year 
given year 
Rice 24 atl. X12 12 25 
Pulses 3 kg. x12 4 6 
Oil 2kg.X12 1:5 22 
Clothing 6 metres X12 075 ro 
Housing — 20 per month 30 per month 
= 10 per month 15 per month 


Miscell; 
iscellaneous (LC.W.A., July, 1972) 


Solution : CALCULATION OF COST OF LIVING INDEX 


Quantity consumed 


Items in the given year 
а ро Pi Pit Pod 
Rice 24 qtl.x12 12°90 25'0 750°0 360°0 
Pulses 3 kg. x12 040 06 21°6 144 
Oil 2kg.x12 r50 22 528 36:0 
Clothing 6 metres X12 075 1'0 720 540 
Housing — 20 рег month 30 per month 3600 2400 
. Miscellaneous — 10 рег month 15 per month 18070 120'0 
Урі Уройі 
—14364 —8244 


iVen current year quantities, we apply Paasche's method. 


A _ Ina _ 14364 е 
Cost of Living Index— Spots X100— -5234 x100—174'24. 


Illustration 35. The sub-group indices of the consumer price index number for 
ial centre for a particular year (with base 


urban non-manual employees of an industrial 


Since we are g 


1960—100) were : 
Food 200 
Clothing . 130 
Fuel and Light 120 
House Rent 150 


Miscellaneous 140 
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The weights are 60, 8, 7, 10 and 15 respectively. Tt is proposed to fix dearness allowance 
in such a way as to compensate fully the rise in the price of food and house rent. 

What should be the dearness allowance expressed as a percentage of wage ? 


Solution, 


Let the income of the consumer be 100 rupees. 
on food and 10 rupees on houserent in 1960. Since the index of 


(M. Com., Deihi, 1970) 
He spent 60 rupees 
food is 200 and the 


house rent 150 for the particular year for which the data аге given, in order to maintain 


the same consumption standards regarding these two items, 
Rs. 120 on food and Rs. 15 on house rent. 


constant, in order to maintain the same 
15—Rs. 165. 
Illustration 36, Given the data 


s he will have to spend 
Since the weights of other items are 


standard he will have to spend 120--8-4-7--154- 
Hence the dearness allowance should be 63 per cent. 


Commodities Commodities 
A A B 
Po 1 1 pr 2 x 
qo 10 5 001 5 2 


where pand respectively stand for price 


period, Find х, if the ratio between Laspeyres’ (Z) and Paasche's 


L: P=28:27 
Solution : 
given ratio in order 


and quantity and sub-scripts stand for time 
(P) Index number is : 
(M.A. Econ., Delhi, 1970) 


Calculate Laspeyres’ and Paasche's Indices and equate them to the 
to determine value of x. 


Commodities ро n d md робо (pui Poi — 
А 1 10 2 5 20 10 10 S 
B 1 S x 2 5x 5 2x 2 
Epio Хродо Уруу Хро 
Sot at T seen PORE Sade AETS а enl -7 Ж 
З pido _ 20+5х luc: 
Laspeyres’ Index: Роу P140 „ЭХ 
а m ш 15 
Paasche's Index; — Py Phi _ 10+2х_ 
d MH ae 917 Spon 7 
.204-5x 
I$. 28 q2045x ЧА © 28 едас 28 
10+2x ^ 27 157. “020727 0871501230 27 


7 
4200--840x —3780--945x ог 105x—420 or x—4 


e In order to work with the ratio, 100 has been omitted from the formula. 
lustration 37. Prepare consumer price index п i 
for 1970 and 1971 taking 1969 аз Базе: А ee dig 
Group 1969 1970 1971 
(Price in Rupees) 
A 20°00 24°00 21°00 
B 1°25 1°50 1'00 
c 5:00 8'00 8°00 
mae 2:00 2:25 2712 
Give weights to the four groups as 4, 3, 2 and 1 respectively, 


(B. Com., Delhi, 1972) 


E Solution h CONSUMER PRICE INDEX FOR 1970 WITH 1969 AS BASE 


Group 1969 1970 Pt x 100 
Po 
e Po Pı w PW 
А 20°00 24:00 1900/5 cena (4800 —— 
ў 48070 
a 125 1:50 1200 3 36070 
c 500 800 160°0 2 32070 
2:00 2:25 1125 1 11255 
EW=10 XPW-12725 
Consumer Price Index= ЗР _ 1272: = —12725 


i 
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CONSUMER PRICE INDEX FOR 1971 WITH 1969 AS BASE 


Group Price in 1969 Price in 1971 xx x100 


E Ме РЕ ЕРЕ Е ЫН». Kp eu 
A 20'00 21°00 105 4 420 
B 1°25 100 80 3 240 
с 5'00 8'00 160 2 320 
T D 2700 212 106 1 106 


y adero dim — "Xw-i10 УРЙ/—1,086 


Consumer Price Index==" 10 — 108°6 


; Illustration 38, Compute the general price index given the following items of 
information: 


Group of items Cereals Other Fuel and House Clothing Misc. 
Food Light 

Group price index 135 152 124 148 107 139 

Average expenditure 

per household per 61 73 19 41 26 82 

month (Rs.) (M.A. Econ., Delhi, 1972) 

Solution : CONSTRUCTION OF GENERAL PRICE INDEX  — 
Group of Group Price Average expenditure 

items Index per household 

per month 
I Ww IW 
Cereals SERE IS v 61 8,235 
Other food 152 73 11,096 
Fuel and Lighting 124 19 2,356 
Housing 148 41 6,068 
Clothing 107 26 2,782 
Misc. 139 82 11398. .— 
m XW-302 XIW--41,935 
е СУПИ 41935 ug. 3 
General Price Index = Sw = x2 =138`86 


Шизїгайоп 39, А textile worker in the city of Bombay earns Rs. 350 per month. 
The cost of living index for a particular month is given as 136. Using the following 
data find out the amounts he spent on house rent and clothing. 


Group Expenditure Group Index 
Food 140 180 
Clothing 1 150 
House Rent 7 100 
Fuel and Lighting 56 110 
Miscellaneous 63 80 
(B. Com., Bombay, 1973) 
Solution, Let the expenditure on clothing be X and on house rent Y. 
350-1404- X-- Y--56--63 
or X+Y=91 0) 


Multiplying expenditure with group index and equating it to 136, we get 
136 40x 180)-F (X x15 + (7 х100)-+- (56 110)-+ (63 x80) 
140+-X+Y+56+63 
25200-+150X+100¥+6160 + 5040 
136 350 — 
136x350—36400--150X--100Y 
47600 —36400--150X--100 Y 1 
150X--100Y —11200 ENG 
Solving the two egations : Х+Ү=91 п 

150X+100Y=11200 «у 
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Multiplying Eqn. (i) by 150 150X--150Y —13650 
150X--100Y —11200 


50Y—2450 or Y—49 
Substituting the value of Y in Eqn. (i) 
Х+49=91  .. X=42 
Hence he spent Rs. 42 on clothing and Rs. 49 on house rent, 


Illustration 40, In a working class budget enquiry in towns A and B, it was 
found in 1968 that an average working class family’s expenditure on food and other 
items was as follows : 


Town A x Town B 
(i) Food 64%, 507; 
(ii) Other items 36% 50% 


In 1971, the consumer price index stood at 279 for town A, 265 for town В (Base year 
1968—100). It was known thatthe rise in the Prices of all articles consumed by the 
working class was the same for А and B. 

What was the 1971 index for (a) food, and (5) other items ? 

(M. Com., Delhi, 1972) 

Solution: Suppose the index number in 1971 for food for both the towns 4 and 
Bis X and that of other items Y. 

For town А total of weighted relatives would be— 

Food XX64—64X ; Other items Yx36—36Y 

But in the problem it is given as 

279 x 100=27900 


64X+36Y=27900 «(0 
Similarly for town B the total of weighted relatives would be— 
Food Xx50—50X ; Other items Yx50-—50Y 


But in the problem it is given as 
265 x 100=26500 
50X--50Y —26500 + (ii) 
The two equations, therefore, are 
64X+36Y=27900 
50X+50Y=26500 
Multiplying Eqns, (i) by 25 and (ii) by 32, we get 
1600X--900Y =697500 
1600X--1600Y —848000 
ж-ы, ae 
—700Y — —150500 s. Y=215 
By substituting the value of Y in Eqn. (i), we get 
64X+(36x215)=27900 or 64X 4-7740—27900 
64X=27900—7740=20160 or X—315 
Thus the index for ‘food’ is 315 and that of ‘other items’ 215, 


Illustration 41, Construct from the following data wei i i 
index for the year 1959 (Base 1958—100) : жынын» uy 


Items Unit Price Price Weight. 

oe 1959 Beak 
Э) Rs, 

Wheat Kg. 0'50 20 2 

Milk Liter 0°60 0775 5 

Eggs Dozen 200 240 4 

Sugar Kg. 0'80 1°00 8 

Shoes One Pair 800 10°00 1 


(В.А. Hons, Econ., Dahi, 1973) 
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Solution: CALCULATION OF WEIGHTED RELATIVE PRICE INDEX 


Items Unit Price Price Price Relative Weights 

1958 1959 = х100 w PW 

(] 
Po Py 

Wheat Kg. 0°50 0775 150 2 300 
Milk Litre 0°60 0'75 125 5 625 
Eggs Dozen 200 240 120 4 480 
Sugar Kg. 0°80 1:00 125 8 1,000 
Shoes Per pair 800 10°00 125 1 125 


=  XPW-250 


SPW 2530 
a= = = 126 
Index ZW 20 1265 
Illustration 42. In the working class consumer price index number in а parti- 
cular town, the weights corresponding to different groups of items were as follows : 


Food 55, Fuel 15, Clothing 10, Rent 8, Misc 12. In October, 1972, the dear- 
ness allowance was fixed by a mill of that town at 182 per cent of the workers’ wages. 
which fully compensated for the rise in the prices of food and rent but did not com- 
pensate for anything else. Another mill ofthe same town paid dearness allowance of 
465 per cent which compensated for the rise in fuel and miscellaneous groups, It is 
known that the rise in food is double that of fuel and the rise in the miscellaneous group. 
is double that of rent. 


Find the rise in food, fuel, rent and misc. groups. (M. Com., Delhi, 1973); 
Solution: Let rise in fuel be X. .". the rise in food was 2X 
Let rise in rent be Y. Г. rise in misc. was 2Y. 


The first mill compensated fully for the rise in food and rent but not for any- 
thing else by paying 182% D.A., i.e., Rs. 282 to one getting Rs. 100. 
The Index after rise for this mill will, therefore, be : 


Items Index Weights х1 
Food 2X 55 пох 
Fuel 100 15 1500 
Clothing 100 10 1000 
Rent Y 8 8Y 
Misc. 100 12 1200 


XW-100  ZWx1—37004-110X--8Y 
3700--110X-H8Y — 
УРЫК ТИКЕН =282 
3700--110X--8Y —28200 or 110X--8Y —28200—3700— 24500 
<. 10X--8Y —24500 (i) 
Similarly the second mill compensated fully for fuel and miscellaneous by paying 46575 


D.A., Le., Rs, 1465 to one getting Rs. 100. 
"7 Index for this mill after rise will be: 


<. Index= 


Items Index Weights х1 
Food 100 55 5500 
Fuel x 15 15X 
Clothing 100 10 1000 
Rent 100 8 800 
Misc. 2Y 12 24Y 
IW=100  XWx1-73004-15X--24Y 
-. Index= OOH SEH 24Y 146°5 


7300--15X--24Y —14650 or 15X--24Y —14650— 7300 
15X4-24Y — 7350 i) 
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Solving Eqns, (i) and (ii) : 110X--8Y —24500 
15X4-24Y —7350 

Multiplying Eqns. (i) by 3: 330X--24Y —73500 
15X4-24Y —7350 


315X —66150  ,. X—210 


Substituting the value of X in Eqn. (i) 
110x210--8Y —24500 
8Y—24500—23100—1400 .. Y=175 


Hence the rises are as follows : 


Food 2X—2x:210—420 
Fuel X =210 
Rent Y Z115 
Misc. 2Y=2x175=350 
Illustration 43, Calculate an index number of crime for 1971 with 1970 as base: 
1970 1971 Weight 
Robberies 13 8 6 
Car thefts 15 22 5 
Cycle thefts 249 185 4 
Pocket picking 328 259 1 
Thefts by servants 497 448 2 
(B. Com, Kurukshetra, 1974) 
Solution : INDEX NUMBER OF CRIME FOR 1971 
1970 1971 Weight Crime Relative 
m 5 2 R RW 
Robberies 13 8 6 + x100=61°54 369°24 
Car thefts 15 22 5 2 x100 —14670 73350 
Cycle thefts 249 — 185 4 185 x 100=74'29 29716 
Pocket picking 328 259 1 E x100—78:96 78:96 
Thefts byservants 497 448 2 58 x100—9014 18028 
Уй/—18 ТИ КЕЕ, 
ОТЫ тыыны сы =1659'14 
ont sigs he 
3 Index= SW rige яә? 17 


ds TN i eg 44. Following information relating to workers in an industrial town 


Items of Consumption Consumer Price Proportion of 

Index in 1970 expenditure on 
ann as (1960=100) the item 

ood, drinks and to’ 

(ii) Clothing pen j 25 rA 
(iii) Fucl and lighting 155 50 
(iv) Housing 250 YA 
(v) Miscellaneous 150 169 
о 


Average wage per month in 1960 is Rs. 200. What 
p i . 200. shi 
per Worker per month in 1970 in that town so that the Sane a eal 
does not fall below the 1960 level ? (B. A, Hons, Econ, Dethi, 1975) 
i i i k 
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Solution, COMPUTATION OF AGGREGATE INDEX 
Items of Consumption Index Weight 

3 I W IW 
Food, drinks and tabacco 225 32 11700 
Clothing 175 8 1400 
Fuel and lighting 155 10 1550 
Housing 250 14 3500 
Miscellaneous 152 16 2400 


SW=100 XIW-—20,550 


BIW 20550 .... 
Index— sw — 100 205 5. 

The average mape per worker. per month should be 205'5 х2=Е5, 411 іп 1970 
in order that their standard does not fall below 1960 level. (We have multiplied the index 
by.2 because the worker was already getting Rs. 200in 1960. If he was getting Rs, 100 
then the increase should be 1055.) 


Illustration 45, From the following average price of the groups of commodities 
given in rupees per unit, find chain base index numbers with 1970 as the base year : 


Group 1970 1971 1972 1973 1974 
I 2 3 4 5 6 
п 8 10 12 15 18 
ш 4 5 8 10 12 
(B. Сот, Kurukshetra, 1975) 
‘Solution, 
CALCULATION OF CHAIN BASE INDEX NUMBERS WITH 1970 AS BASE 
1970 1971 1972 1973 1974 
> ә » 2 2 » 
Е s = 3 s 
aj) "à & |&| "x |S] "8 
I 100 3 150 4 1333 5 125 6 120 
| 
11 8 | 100 10 125 12 120 15 125 | 18 120 
ш 4|100 5 125 | 8 160 10 125 |12 120 
1 1 AM rem PERI T Wena 
Total 300 400 4133 375 360 
Average | | А | 
Link Ке- 100 13333 1378 125 120* 
lative | 
E RT : * 
Chain | 13333! |13333X1378| — 18373x125| — 2967x120 
indices 100) |100 -Too 100 aegre 100 
Chained —13333 | 718373 —229'6; =275 
to 1970 | | и ан 


Illustration 46. From the chain base index numbers given below, prepare fixed 
base index numbers and verify the answers. 


Year 1971 1972 1973 1974 1975 
Index 110 160 140 200 150 
(B. Com. Delhi 1976) 


Solution. The formula for converting chain base indices to fixed base index is 
as follows : s 
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- Fixed base index Chain base index of- „ Fixed base index of 


of current year the current year the previous year 
100 
Chain base ~~ Fixed base 
Year index Conversion index 
1971 110 — 1100 
160x110 Т 
1972 160 —100 - 1760 
1973 140 MS. 2464 
200 « 24674 у 
1974 200 $ 20 " 492:8 
150 x 492" 1 
1975 150 7390 — 7392 


Verification. If from fixed based index we calculate chain base index the 
answer should be the same. For example for 1972 chain base index would be 
176 100-160; for 1973 chain base index would be 2464 »100— M0 etc. 

Illustration 47. The following аге the prices of commodities in 1970 and 1975. 
Calculate a price index based on price relatives using the geometric mean. 


Commodity 
Year A B € D E F 
1970 45 60 20 50 85 120 
1975 55 70 30 75 90 130 
(B. Com. Bombay 1976) 
Solution. CALCULATION OF PRICE INDEX 
A Price in 1970 Price in 1975 Price Relatives LogP 
Commodities Po pi P x 
A 45 55 12222 2:0871 
B 60 70 11667 2:0671 
[^ 20 30 150:00 2:1761 
D 50 75 15000 21764 
E 85 90 105°88 20250 
F 120 130 108:33 2:0346 
У log P=12°566 


Price Index—Antilog Quer ) 


=A 72% Ar, 2:094—1242 


Illustration 48. It is stated that Marshall Edgeworth index number is à good 
approximation to the idea] index St Verify using the following data : 
1969 


Commodity Price Qty. Price Qty. 
„4 2 74 3 82 
B 5 125 4 140 
С 7 40 33 
(B. Com. Bombay, 1975) 
Solution : 
CALCULATION OF FISHER'S IDEAL AND MARSHALL 
EDGEWORTH'S INDEX 
| 1966 1969 
Commodity Ро gı Pi 91 Pio Реда —— PM Pods 
A 2 74 3 82 222 148 246 164 
B 5 125 4 140 500 625 560 700 
б 7 CONCI. 33 240 280 198 231 
Fisher's Index Ema Epoa Ура Ур 


—962 =1053 =1004  —1095 
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Урт 2719: 
Pu- [22190 xm v 
"Siu. Урау 00 
= [962 1004 
/ 1053 Х 1095 100 
734/0838: 100—0:915 x 100=91'5 
Marshall Edgeworth's Index 
У; 
Ур1(д›-+@1) 
-RTE 100 
Xp 4o 41) 
_ Xnectina 
Хродо+ Хрол 
962-1004 
77053-1095 Х!00 
=0'916 x 100—91*6 
The above calculations clearly show that the result obtained by the Fisher's 
method and Marshall Edgeworth’s method is almost the same. 


Illustration 49. Compute by Fishers formula the Quantity Index Number 
from the Data given below : 


91 


x100 


1974 1976 
Article Price (Rs.) Total Value (Rs.) Price (Rs.) Total Value (Rs.) 
A 5 50 4 48 
B 8 48 7 49 
c 6 18 20 
(B. Com., Delhi 1977) 
Solution, » 
CALCULATION OF QUANTITY INDEX BY FISHER'S METHOD 
А 1974 1976 
Artiticle Po Pi qp CHBPeo BoP, © te VU ФР1 
А 5 10 4 12 60 50 48 40 
B 8 6 7 7 56 48 49 42 
с 6 3 5 4 24 18 20 15 
Daas МАН". Эр, фро йр 24 
ао sedie тилей} И 


[атро Zap 
= PI 4 ul x100 
а= „У Sap, Sài 


ase B LR ERES 
= pe our 1510 


= \/ 146 x 100=1'208 x 100—120:8 
Iilustration 50. (a) During a certain period the cost of living index number goes 


up from 110 to 200 and the salary of a жое is оро лавай пот Rs, 325 to Rs. 500. 
ally gain, and if so, by how much in real terms ? 
Does the worker really gain. y (C. A. 1975) 


Solution. The cost of living index has gone up from 110to200. To maintain 
the same standard of living, the salary of the worker should be increased from 
Rs. 325 to 325 x ERs. 59091. His salary has gone up from Rs. 325 to Rs. 500, 
Thus the worker bas not really gained. There should be a further increase of Rs. 90°91 
(i.e. 590'91— 500) so that he is placed in the same situation as before, i.e. when the cost 
of living index was 110. ғ 

(b) From following data Calcuiate Fisher's Ideal Index, Also from the results 
see if Time Reversal test is satisfied. 


5М-Е—9'77-30 


E-1359 INDEX NUMBERS 


Jtem Price Quantity 
1960 1970 1960. 1970 
A 8 20 50 60 
B 2 6 15 10 
c 1 2 20 25 
D 2 5 10 8 
B 1 3 40 30 
(B. Com., Madras 1977) 
Solution, À 
CALCULATION OF FISHER'S IDEAL INDEX 
1960 1970 
Item Po 9 p qı Ріо Podo Pig Pod 
A 8 50 20 60 1000 400 1200 480 
B 2 15 6 10 90 30 60 20 
с 1 20 2 25 40 20 50 25 
р 2 10 S 8 50 20 40 16 
E 1 40 5 30 200 40 150 30 
Zpo È Poo Zpui 2091 
=1380 =510 =1500 =571 


Spiga. Zma 
Pou= | 2140 P1421 3100. 
У "Уд < роді 


[1380 . 1500 
= ‚== xxi 
V 759 35 099 
=V 7:108 x 100=2'666 х 100=266°6 
Time Reversal Test is satisfied when 
Li Po, X Pio=1 
Рь= 7277477708 
10 АУ Epa  3p,do 
= AREA. 510 
у 1500 "1380 
„14138071500, _ 57]... 510 
Pax Pu= 519 X 57р * 399: X 1380 
=y 1-4 
Hence the given data satisfied the Time Reversal Test. 
Limitations of Index Numbers 


Though the Index Numbers are of great significance, the reader must 
also be aware of their limitations so that he avoids errors of interpreta- 
tion. The chief limitations of index numbers are: 

l. Since index numbers are generally based опа sample, it is not 
possible to take into account each and every item in the construction of 
the index. 


2. While taking the sample, random sampling is seldom used. This is 

so because to sample from a population of literally millions of commo- 

| dities and services, the random procedure could be neither practical nor 

| representative. Typically, indices are constructed from samples delibera- 

tely selected. This is likely to introduce errors and every effort must be 
made to minimise these errors. 

3. It is often difficult to take into account changes in the quality of 

products. With the passage of time, tastes and habits of people also 

change with the result that very often old commodities go out of use and 
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new commodities are introduced. In a really typical index, qualities of 
commodities should remain the same overa period of time because 
differences in quality would mean differences in prices also. But very 
often it is not practicable and it makes comparisons over long periods less 
reliable. 

4. Alarge number of methods are designed for constructing index 
numbers and different methods of computation give different results. Very 
often the selection of an appropriate formula creates problems and in the 
interest of comparability, it is necessary to ensure that the same formula 
is adopted overa period of time for constructing a particular index. 
There is no index number method which is most satisfactory from all 
various points of view which may logically or practically be taken. Index 
aumbers are averages, and all averages are basically compromises between 
opposing extremes or forces. 

5. Just like other statistical tools, index numbers can also be 
manipulated in such a manner as to draw the desired conclusions. Choosing 
a freak year is a favourite trick of those who use statistics to mislead. А 
dishonest capitalist could choose a record year of profits as base and so 
*prove' subsequent profits to be pitifully low. Similarly, in order to 
prove that the current prices are intolerably high a dishonest trade 
unionist may choose a year of exceptionally low prices as base. 

6. Since in the construction of index numbers a large number of 
factual questions are involved, lack of adequate and accurate data in most 
cases becomes a serious limitation of the index itself. In most of the cases 
one cannot collect the data himself and, therefore, one has to rely on a 
published source. Ordinarily, we draw upon many sources of data which 
are geographically dispersed. Problems of comparability and reliability 
thus multiply and the chances of spurious results are increased. One 
mistake may “bias” the index such as including the price of one commo- 
dity for one time period, or the price of a slightly different commodity for 
another period, or taking the manufacturer's price at one time and 
wholesaler’s or retailer's price at another. 
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Analysis of Time Series 


One of the most important tasks before economists and businessmen 
these days is to make estimates for the future. For example, a business- 
man is interested in finding out his likely sales in the year 1979 so that 
he could adjust his production accordingly and avoid the possibility of 
either unsold stocks or inadequate production to meet the demand. Simi- 
larly, an economist is interested in estimating the likely population in 
the coming year so that proper planning can be carried out with regard 
to food supply, jobs for the people, etc. However, the first step in mak- 
ing estimates for the future consist of gathering information from the 
past. In this connection one usually deals with statistical data which are 
collected, observed or recorded at successive intervals of time. Such data 
are generally referred to as ‘time series*. Thus when we observe nume- 
rical data at different points of time the set of observations is known as time 
series. For example, if we observe production, sales, population, imports, 
exports, etc., at different points of time, say, over the last 5 or 10 years, 
the set of observations formed shall constitute time series. Hence, if 
the analysis of time series time is the most important factor because the 
variable is related to time which may be either year, month, week, day, 
hour or even minutes or seconds. 

In the chapter on graphic presentation we talked of time series. But 
graphs are simple visual aids in understanding such series. They do not 
help in analysis of such series. In the present chapter, we shall discuss 
the various techniques that are helpful in analysing such series. 

It should be noted that the term ‘time series’ is usually used with refe- 
тейсе to economic data and the economists are largely responsible for the 
development of the techniques of time series analysis. However, the 
term ‘time series’ can apply to all other phenomena that are related to 
time such as the number of accidents occurring in a day, the variation 
in the temperature of a patient during a certain period, number of marri- 
ages taking place during a certain period, etc. In this text, our discussion 
shall be limited to time series of economic and business data. But these 
techniques can also be applied to any of the other natural or social 
Sciences. 

The problem of time series analysis can best be appreciated with the 
help of the following example : 

The following are the figures of sales of a firm in thousand units : 


Year Sales of Firm A Year Sales of Firm A 
(thousand units) (thousand units) 

1970 40 1974 43 

1971 42 1975 48 

1972 47 1976 65 

1973 4l 1977 42 


*A time series is a set of observations taken at specified times, usually at 
“equal intervals’. Mathematically, a time series is defined by the values Y;, Үз...0ѓ a 
variable Y (temperature, closing price of a share, eic.) at times їз, fs... Thus Yisa 
function of f, symbolised by Y=F (1).^—Spiegel: Statistics, p. 283. 
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: If we observe the above series we find that generally the sales have 
increased but for two years a decline is also noticed. There may be seve- 
ral causes responsible for increase or decrease from one period to another 
such.as changes in tastes and habits of people, growth of population, 
availability of alternate products, etc. It may be very difficult to study 
the effect of various factors that have led either to an increase or decrease 
in sales. The statistician, therefore, tries to analyse tht effect of the 
various forces under four broad heads : 

(1) Changes that have occurred as a result of general tendency of the 
data to increase or decrease, known as ‘secular movements’. 

(2) Changes that have taken place during a period of 12 months 
asa result of change in climate, weather conditions, festivals, etc. Such 
changes are called ‘seasonal variations’. 

(3) Changes that have taken place as a result of booms and dep- 
ressions. Such changes are classified under the head ‘cyclical variations’. 

(4) Changes that have taken place as a result of such forces that could 
not be predicted like floods, earthquakes, famines, etc. Such changes are 
classified under the head ‘irregular or erratic variations’. 

These are called components of time series and shall be discussed 
in detail. 

Utility of Time Series Analysis 

The Analysis of Time Series is of great significance not only to the 
economist and businessman but also to the scientist, astronomist, geo- 
logist, sociologist, biologist, research worker, etc., for the following 
reasons : 

(1) It helps in understanding past behaviour. By observing data over 
a period of time one can easily understand what changes have taken place 
inthe past. Such analysis will be extremely helpful in predicting the 
future behaviour. 

(2) It helps in planning future operations. Plans for the future can- 
not be made without forecasting events and relationship they will have. 
Statistical techniques have been evolved which enable time series to be 
analysed in such a way that the influences which have determined the form 
of that series may be ascertained. If the regularity of occurrence of any 
feature over a sufficient long period could be clearly established then, with- 
in limits, prediction of probable future variations would become possible. 

(3) It helps in evaluating current accomplishments. The actual perfor- 
mance can be compared with the expected performance and the cause of 
variation analysed. For example, if expected sales for 1978 were 10,000 
refrigerators and the actual sales were only 9,000, one can investigate 
the cause for the shortfall in achievement. Time Series analysis will en- 
able us to apply the scientific procedure of ‘holding other things cons- 
tant" as we examine one variable at a time. For example, if we know 
how much is the effect of seasonality on business we may devise ways 
and means of ironing out the seasonal influence or decreasing it by pro- 
ducing commodities with complementary seasons. 

(4) It facilitates comparison. Different time series are often com- 
pared and important conclusions drawn therefrom. 

However, one should not be led to believe that by time series ana- 
lysis one can foretell with 100 per cent accuracy the course of future events. 
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After all, statisticians аге not fortune-tellers. This could be possible 
only if the influence of the various forces which affect these series such 
аз climate, customs and traditions, growth and decline factors and the 
complex forces which produce business cycle would have been regular 
їп their operation. However, the facts of life reveal that this type of 
regularity ас“: not exist. But this then does not mean that time series 
&nalysis is of no value. When such analysis is coupled with а careful 
examination of current business indicators one can undoubtedly improve 
substantially upon guesstimates (i.e. estimates based upon pure guess work) 
їп forecasting future business conditions. 
COMPONENTS OF TIME SERIES y 

It is customary to classify the fluctuations of a time series into four 
basic types of variations, which superimposed and acting all in concert, 
account for changes in the series over a period of time. These four types 
of patterns, movements, or, as they are often called, components or ele- 
ments of a time series, are : 

(1) Secular Trend. 

(2) Seasonal Variation. 

(3) Cyclical Variation. 

(4) Irregular Variation. 


Please look at the following graph showing the sale of Coca Cola 

‘for the years 1958 to 1972. 
1 The original data in this graph is represented by curve (а). The 
general movement persisting over a long period of time represented by the 
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diagonal;line (5) drawn through the irregular curve is called secular trend. 

Next, if we study the irregular curve year by year, we see that in 
each year the curve starts with a low figure and reaches a: peak about the 
middle of the year and then decreases again. This type of fluctuation, 
which completes the whole sequence of changes within the span of a year 
and has about the same pattern year, after year is called a seasonal 
variation. 

Furthermore,looking at the broken curve superimposed on the 
original irregular curve, we find pronounced fluctuations moving up and 
down every few years throughout the length of the chart. These are known 
as business cycles or cyclical fluctuations. They are so called because they 


ANALYSIS OF TIME SERIES E-14'4 


comprise a series of repeated sequences just as a wheel goes round and 
round. 

Finally, the little saw-tooth irregularities on the original curve 
represent what are referred to as irregular movements. 

In traditional or classical time series analysis, it is ordinarily 
assumed that there is a multiplicative relationship between these four 
components, that is, it is assumed that any particular value in a series 
is the product of factors that can be attributed to the various components. 
Symbolically : 

O=TXSXCxI 


where О denotes original data; T=Trend ; S=Seasonal (Component ; 
C=Cyclical Component ; J=Irregular Component. 
Another approach is to treat each observation of a time series as the sum 
of these four components. Symbolically : 

O=T+S+C+I : 

To prevent confusion between the two models it should be pointed 
out that in the multiplicative model 5, С and J are indexes expressed as 
decimal per cents. In the additive model S, C and I are quantitative 
deviations about trend that can be expressed as seasonal, cyclical and 
irregular in nature. 

Example. If in multiplicative model, T=400, $—1:5, C=1'2 and 
7=0'8, then i 

О=Тх5хСх1=400х1:5х12х0'8=576 
If in the additive model, T=4C0, S=+120, C= +20 and 1= —40 
Then O=400+120+20—40=500 


The additive model assumes that all the components of the time series are 
independent of one another. For example, it assumes that trend has no 
effect on the seasonal component no matter how high or low this value 
may become. Further, it assumes that the business cycle has no effect on 
the seasonal component. If the index for December is typically 1°50 or 
150%, this per cent will not be affected by either prosperity or recession. 

While the additive model may work well within limits, it is doubtful 
if one always can rely on the independence of components that it assumes. 

In the multiplicative model, it is assumed that the four components 
are due to different causes but they are not necessarily independent and 
they can affect one another. 

There is little agreement amongst experts about the validity of the 
different assumptions—some feel that the given classification is too crude 
and that there are more than four types of movements. Nothing specific 
is really known about how the components are related, how they combine 
to produce particular effects, or whether they are really separable. The 
effects of the various components might be additive, multiplicative or they 
might be combined in'any one of indefinitely large number of other ways. 
Different models (assumptions or theories) will lead to different results. 
Although the additive assumption is undoubtedly true in some cases, the 


There are numerous variations of these two basic models. Two such variations 


O=TCS+1 and O=TC+SI 
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multiplicative assumption characterizes the majority of economic time 
series. Consequently the multiplicative model is not only considered the 
standard or traditional assumption for time series analysis but it is more often 
employed in practice than all other possible models combined. For this 
reason, we shall use only the multiplicative model in our subsequent 
discussion. 

The task of performing a time series analysis, just like the analysis 
of a chemist in breaking a substance into its constituent parts, is to operate 
on the.data in such a way as to bring out separately each of the com- 
ponents present. 


i. Trend 


The term ‘trend’ is very commonly used in day-to-day conversation. 
Forexample, we often talk of rising trend of population, prices, etc. 
Trend also called secular or long-term trend is the basic tendency of pro- 
duction, sales, income, employment, etc., to grow or decline over a period 
of time, The concept of trend does not include short-range oscillations 
but rather steady movements over a long time. 

What causes this growth or decline ? In economic time series growth 
in population is a main cause. The presence of more people means that 
more food, clothing, housing are necessary. Technological changes, dis- 
covery and exhaustion of natural resources, mass production methods, im- 
provements in business organisation, and government intervention in the 

' economy are other major causes for the growth or decline of many econo- 
mic time series. In some cases, growth in one series involves decline in an- 
other, for example, the displacement of silk by rayon, the bullock-carts by 
other modes of transportlike trucks, tempo, etc. Similarly better medical 
facilities, improved sanitation, diet, etc., on the one hand, reduce the death 
rate and on the other contribute to a rise in birth rate. 

"There are all sorts of trends ; Some series increase slowly and some 
increase fast, others decrease at varying rates, some remain relatively 
constant for long periods of time, and some after а period of growth or 
decline reverse themselves and enter а period of decline or growth. Broadly 
speaking, the various types of trends are divided under two heads : 

(1) Linear or Straight Line Trends ; and 

(2) Nondinear Trends. 

For a proper understanding of the meaning of trend, the reader's 
attention is directed to the following two points : 

(@ The meaning of long term. When we say that secular trend 
refers to the general tendency of the data to grow or decline over a long 
period of time, one may be interested in finding out as to what constitutes 
а long period of time. Does it mean several years ? The answer is ‘no’. 
On the other hand, whether a particular period can Бе regarded as long or 
not in the study of secular trend depends upon the nature of the data. То 
take an example, if we are studying the figures of sales of a firm for 1977 
and 1978 and we find that in 1978 the sales have gone up, this increase 
cannot be called as secular trend because this is too short a period of time 
to conclude that the sales are showing an increasing tendency. On the other 
hand, if we puta strong germicide into bacterial culture, and count the 
number of organism still alive after each 10 seconds for 8 minutes, these 
40 observations Showing a general pattern would be called secular move- 
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ment. It is clear from this example that in one case 2 years could not be 
regarded as a long period whereas in another case even 8 minutes consti- 
tuted long period. Hence the nature of the data would dictate whether a 
particular period would be called as long or not. 

Generally speaking, the longer the period covered, the more signi- 
ficant the trend. When the period is short, the secular movements cannot 
be expected to reveal themselves clearly and the general drift of the series 
may be unduly influenced by the cyclical fluctuations. This would make 
it difficult to separate the various series of variations in time series. Аза 
minimum safeguard, it may be said that to compute trend the period must 
cover at least two or three complete cycles. 

(ii) Another point worth mentioning is that for concluding whether 
the data is showing an upward tendency or downward tendency : it is 
not necessary that the rise or fall must continue in the same direction 
throughout the period. We have to observe the general tendency of the 
data. As long as we can say that the period as a whole was characterized 
by an upward movement or by a downward movement, we say that a 
secular trend was present. For example, if we observe the trend of prices 
over a period of 20 years and find that except for a year or two the 
prices are continuously rising, we would call it a secular rise in prices. 


2. Seasonal Variations 

Seasonal variations are those periodic movements in business activity 
which occur regularly every year and have their origin in the nature of the 
year itself. Since these variations repeat during a period of 12 months 
they can be predicted fairly accurately. Nearly every type of business 
activity is susceptible to seasonal influence to a greater or lesser degree and 
as such these variations are regarded as normal phenomenon recurring 
every year. Although the word ‘seasonal’ seems to imply a connection 
with the season of the year, the term is meant to include any kind of 
variation which is of periodic nature and whose repeating cycles are of 
relatively short duration. Seasonal variation is evident when the data are 
recorded at weekly or monthly or quarterly intervals. Although the 
amplitude of seasonal variations may vary, their period is fixed being one 
year. As a result, seasonal variations do not appear in series of annual 
figures. The factors that cause seasonal variations are : 

(i) Climate and weather conditions. The most important factor 
causing seasonal variations is the climate. Changes in the climate and 
weather conditions such as rainfall, humidity, heat, etc., act. on different 
products and industries differently. For example, during winter therejis 
greater demand for woollen clothes, hot drinks, etc., whereas in summer 
cotton clothes, cold drinks have a greater sale. Agriculture is influenced 
very much by the climate. The effect of the climate is that there are gene- 
rally two seasons in agriculture—the growing season and harvesting 
season—which directly affect the income of the farmer which in turn 
affects the entire business activity. 

(i) Customs, traditions and habits. Though nature is primarily res- 
ponsible for seasonal variations in time series, customs, traditions and 
habits also have their impact. For example, on certain occasions like Deep- 
awali, Dussehra, Christmas, etc., there is a big demand for sweets and also 
there is a large demand for cash before the festivals because people want 
money for shopping and gifts. Similarly, on the first of every month 
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there are heavy withdrawals and the bankers have to keep lot of cash 
to meet the possible demand on the basis of last month experience. To 
take another example, most of the students buy books in the first few 
months of the opening of schools and colleges and thus the sale of books, 
stationery, etc., shows seasonal swings. р 
The study and measurement of seasonal patterns constitute a very im. 
portant part of analysis of a time series. In some cases, seasonal patterns 
themselves are of primary concern because little, if any, intelligent plan- 
ning or scheduling (of production, inventory, personnel, advertising and the 
like) can be done without a knowledge based on adequate statistical mea- 
sures of seasonal patterns. In other cases the seasonal variation may not be 
of immediate concern, but it must be measured to facilitate the study 
of other types of variations based on adequate statistical measure of 
Seasonal patterns. An accurate knowledge of seasonal behaviour is an 
aid in mitigating and ironing out seasonal movements through business 
policy. This may be done by introducing diversified products having diffe- 
Ient seasonal peaks, accumulating stock in slack seasons in order to 
manufacture at a more regular rate, cutting prices in slack seasons and 
advertising off-seasonal use for the products. Seasonal indexes are also 
helpful in scheduling purchases, inventory control, personnel require- 
ments, seasonal financing and selling and advertising programmes. 
For example, a housewife may buy fruits for canning or preserving at the 
peak of the season when the prices are low and quality high. Seasonal fluc- 
tuations may also be ironed out in order that the intra-year fluctuations 
may be less pronounced. Thus, attempts were made in U.S.A. to build 
up winter demand for ice-cream by advertising “Ice cream is one of your 
best foods. Eat one plate a day." 
3. Cyclical Variations 


The term ‘cycle’ refers to the recurrent variations in time series that 
oe last longer than a year and are regular, neither in amplitude nor in 

length. 

Most of the time series relating to economics and business show 
some kind of cyclical variation, Cyclical fluctuations are long-term move- 
ments that represent consistently recurring rises and declines in activity. 
A business cycle* consists of the recurrence of the up-and-down move- 
ments of business activity from some sort of statistical trend or "normal". 
By “normal” we mean some kind of statistical average : we do not mean 
that there is anything very permanent or special There are four well- 
defined periods or phases in the business cycle, namely : (i) prosperity, 
(ii) decline, (iii) depression, and (iv) improvement. 

Each phase changes gradually into the phase which follows’ it 
in the order given. The following diagram would illustrate a cycle. In 
the prosperity phase of the business cycle the public is optimistic—busi- 
hess is booming, prices are high and profits are easily made. There is a 


* "Business cycles are a type of fluctuation found in the aggregate economic 
activity of nations that organize their work mainly in business enterprises: a cycle 
consists of expansions occurring atabout the same time in many economic activities, 
followed by similarly general recessions, contractions, and revivals which Merge into the 
expansion phase of the next cycle; this sequence of changes is recurrent but not 
periodic ; in duration business cycles vary from more than one year to ten or twelve 
years, they are not divisible into shorter cycles of similar character with amplitudes 
approximating their own". — Arthur Burns and Mitchell. 
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considerable expansion of business activity which leads to an over-develop- 
ment. It is then difficult to secure deliveries and there is shortage off 


PHASES OF BUSINESS CYCLE 


transportation facilities, which has a tendency to cause large inventories 
to be accumulated during the time of highest prices. Wages increase and 
labour efficiency decreases. The strong demand for money causes interest 
rates to rise to a high level while doubt enters the bankers mind as to 
the advisability of granting furtherloans. This situation causes business- 
men to make price concessions in order to secure the necessary cash. Then 
follows the expectation of further reductions and the situation becomes 
worse instead of better. Buyers wait for lower prices and all this leads 
to a decline in business activity. Then follows period of pessimism in 
trade and industry ; factories close, businesses fail, there is widespread 
unemployment, while wages and prices are low. These conditions charac- 
terize the period of depression. After a period of rigid economy liqui- 
dation and reorganisation, money accumulates and seeks a use. Then: 
follows a period of increasing business activity with rising prices, à period 
of improvement or recovery. The improvement period generally develops. 
into the prosperity period and a business cycle is completed. The move- 
ments discussed above are constanly repeated in the order given as the 
cycle completes its swing. 

The study of cyclical variations is extremely useful in froming 
suitable policies for stabilizing the level of business activity, i.e., for 
avoiding periods of booms and depressions as both аге bad for an 
economy— particularly depression which brings about a complete disaster 
and shatters the economy. 

But despite the great importance of measuring cyclical variations, 
they are the most difficult type of economic fluctuation to measure, It is 
because of the following two reasons : 

(i) Business cycles do not show regular periodicity—they differ 
widely in timing, amplitude and pattern which makes their study very 
tough and tedious. 

(ii) The cyclical variations are mixed with erratic, random ог 
irregular forces which make it impracticable to isolate separately the effect: 


of cyclical and irregular forces. „Б 
Business cycles are distinguished from seasonal variations in the 


following respects : 

(i) The cyclical variations are of a longer duration than a year. A 
business cycle may be of any duration but normally the. period of business 
cycle is 2—10 years. Moreover, they do not ordinarily exhibit regular 
periodicity as successive cycles vary widely in timing, amplitude and 
pattern. 
For example, the 23 cycles of general business in the United States 
between 1854 and 1949 averaged 49 months ; in duration individual cycles. 


EM 
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differed greatly—the shortest lasted only 29 months and the longest 
persisted for 99 months. 


(ii) The fluctuations in a business cycle result from a different set of 
causes. The period of prosperity, decline, depression and improvement 
viewed as four phases of a business cycle are generated by factors other 
than weather, social customs, and those which create seasonal patterns. 


4. Irregular Variations* 


Irregular variations refer to such variations in business activity which 
do not repeat in a definite pattern. In fact the category labelled irregular 
variation is really intended to include all types of variations other than 
those accounting for the trend, seasonal and cyclical movements. These 
latter three, if they are actually at work , act in such a way as to produce 
certain systematic effects. Irregular movements, on the other hand, are 
considered to be largely random, being the result of chance factors which, 
like those determining the fall of a coin, are wholly unpredictable. 

Irregular variations are caused by such isolated special occurrences as 
floods, earthquakes, strikes and wars. Sudden changes in demand or very 
rapid technological progress may also be included in this category. By their 
very nature these movements are very irregular and unpredictable. Quanti- 
tatively it is almost impossible to separate out the irregular movements 
and the cyclical movements. Therefore, while analysing time series the 
‘trend and seasonal variations are measured separately and the cyclical and 
irregular variations are left together. 


There are two reasons for recognizing irregular movements : 


(i) To suggest that on occasions it may be possible to explain certain 


roy mente in the data as due to Specific causes and to simplify further 
analysis. 


(i) To emphasise the fact that predictions of economic conditions 
are always subject to degree of error owing to the unpredictable erratic 
influences which may enter. 


Problems of Classification 


Although it is a simple matter to classify the factors affecting time 
‘series into these four groups for analytical purposes, the actual application 
of the classification frequently presents serious problems. Seasonal 
variations are by no means always so uniform in amplitude and timing 
that their identification can be made with certainty. Consequently, the 
investigator is often hard put to distinguish seasonal influences from 
cyclical or random factors. Long and severe cycles may, to some observers, 
appear to be changes in the direction of the regular trend. During the 
great depression of the 1930's, for example, many leading economists 
interpreted the existing conditions not asa cyclical depression but as 
“secular stagnation”, 

_ Another difficulty arises because the four components of time 
series data are not mutually independent of one another. An exceedingly 
Severe seasonal influence may aggravate or even precipitare a change in 
the cyclical movement. Conversely cyclical influence may seriously affect 


the seasonal. A very rapidly rising trend virtually eliminates seasonal and 
cyclical variations, 


“Irregular variations are also called ‘erratic’, ‘random’ or ‘accidental’ variations. 
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Finally, the fourfold breakdown of time series data when applied to 
general economic conditions has frequently been challenged on analytical 
grounds. Bratt* sees not one trend, but two ; a primary trend representing 
the long-term growth of productive capacity and the drift away from it, 
which he calls secondary trend. Schumpeter developed an even more 
detailed breakdown by identifying three cyclical components, the 3-year 
Kitchin cycle, the 10-year Juglar cycle and the 50-year Kondratieff cycle. 
The divergence of opinion among eminent scholars indicates clearly that 
the fourfold breakdown is mere approximation, convenient to employ 
but frequently subject to modification. 


Preliminary Adjustments before Analysing Time Series 


Before beginning the actual work of analysing a time series it is. 
necessary to make certain adjustments inthe raw data. The adjustments. 
may be needed for : 


1. Calendar Variation. 
2. Population Changes. 
3. Price Changes. 

4. Comparability. 


1. Calendar Variation, A vast proportion of the important time: 
series is available in a monthly form and it is necessary to recognise that 
the month is a variable time unit. The actual length of the shortest month 
is about 10 per cent less than that of the longest, and if we take into 
account holidays and weekends, the variation may be even greater. Thus, 
the production or sales for the month of February may be less not because 
of any real drop in activity but because of the fact that February has 
fewer days. Thus the purpose of adjusting for calendar variation is to 
eliminate certain spurious differences which are caused by peculiarities of 
our calendar. The adjustment for calendar variation is made by dividing 
each monthly total by the number of days in the month (sometimes by the : 
number of working days in the month) thus arriving at daily average for 
each month. Comparable (adjusted) monthly data may then be obtained 
by multiplying each of these values by 30°4167, the average number of 
days in a month. (In a leap year this factor is 30°5.) 


2. Population Changes. Certain types of data call for adjustment 
for population changes. Changes in the size of population сап easily 
distort comparisons of income, production and consumption figures. For 
example, national income may be increasing year after year, but per capita 
income may be declining because of greater pressure of population. Simi- 
larly, the production of a commodity may be going up but the per capita 
consumption may be declining. In such cases where it is necessary to adjust 
data for population changes, a very simple procedure is followed, ie, the 
data are expressed on a per capita basis by dividing the original figures 
by the appropriate population totals. 

3. Price Changes. An adjustment for price changes is necessary 
whenever we have a value series and are interested in quantity changes. 
alone. Because of rising prices the total sale proceeds may go up even. 


*Elmer C. Bratt: Business Cycles and Forecasting. 
+ J. A. Schumpeter : Business Cycles. 
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when there is a fall in the number of units sold. - For example, if in 1975, 
1,000 units of a commodity that is priced Rs. 10 are sold, the total sale 
proceeds would be 1,000 x 10=Rs. 10,000. Now suppose іп 1976 the price 
of the commodity increases from Rs, 10 to Rs. 11. If the sales do not de- 
cline, the total sale proceeds will be 1,000 11—Rs. 11,000. This increase 
im sale proceeds, i.e. Rs. 1,000, is not due to increase in the demand of the 
«commodity but purely because of the rise in price from Rs. 10 to Rs. 11. 
Since value is equal to price per unit multiplied by the number of units 
sold, the effect of price changes can be eliminated by dividing each item in 
a value series by an approximate price index. This in fact is the process 
of deflating which has been discussed in the chapter on Index Numbers. 

4. Comparability. For any meaningful analysis of time series, it is 
necessary to see that the data are strictly comparable throughout the time 
period under investigation. Quite often it is difficult or even impossible 
io get strictly comparable data. For example, if we are observing a 
phenomenon over the last 25 years, the comparability may be observed by 
differences in definition, differences in geographical coverage, differences 
in the method adopted, change in the method of reporting, etc. For 
example, a sale figure for January 1976 may give the average for that 
month, some years later the corresponding sales figure may give the total 
for the month, or perhaps sales on the 15th or last day of the month. If 
‘such type of changes are not taken into account, the data cannot strictly 
фе compared and its analysis would lead to fallacious conclusions. 


MEASUREMENT OF TREND 


Given any long-term series, we wish to determine and present the 
direction which it takes—is it growing or declining ? There are two 
important reasons for trend measurement. 

l. To find out trend characteristics in and of themselves. In studying 
trend in and of itself, we ascertain the growth factor. For example, we 
can compare the growth in the textile industry with the growth in the 
economy as a Whole or with the growth in other industries, or we can 
compare the growth in one firm of the textile industry with the growth in 
the industry as a whole. Moreover, we can compare through trend 
characteristics the growth of the textile industry in India with that of other 
countries. The growth factoralso helps us in predicting the future 
behaviour of the data. If a trend can be determined, the rate of 
change can be ascertained and tentative estimates concerning future made 
accordingly. 

2. To enable us to eliminate trend in order to study other elements. 
The elimination of trend leaves us with seasonal, cyclical and irregular 
factors. We can then, in two or more series, compare or use the impact 
2 these three relatively short-term elements divorced from the long-term 

actor. 


The various methods that can be used for determining trend are : 
1. Freehand or graphic method. 

2. Semi-average method. 

3. Moving average method. 

4, Method of least squares. 

Eachof these methods is discussed below. 
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1. Freehand or Graphic Method 

This is the simplest method of studying trend. The procedure of 
obtaining a straight line trend by this method is given below. 

1, Plot the time series on a graph. 

2. Examine carefully the direction of the trend based on the 
plotted information (dots). 

3. Drawa straight line which will best. fit to the data according to 
personal judgment. The line now shows the direction of the trend. 

It is clear from the above that there is no formal statistical criterion 
whereby the adequacy of such a line сап be judged and it all depends on 
the judgment of the statistician. However, as a rough guide the line 
should be so drawn that it passes between the plotted points in such a 
manner that the fluctuations in one direction are approximately equal to 
those in the other direction and that it shows a general movement. 

When a trend line is fitted by the freehand method, an attempt 
should be made to make it conform as much as possible to following 
conditions*: 

1. It should be smooth—either a straight line or a combination of 
long gradual curves. 

. The sum of the vertical deviations from the trend of the annual 
observations above the trend should equal the sum of the vertical devia- 
tions from the trend of the observations below the trend. 

3. The sum of the squares ofthe vertical deviations of the obser- 
vations from the trend should be as small as possible. 

4. The trend should bisect the cycles so that the area above the trend 
equals that below the trend, not only for the entire series but as much 
as possible for each full cycle. This last condition cannot always be 
met fully, but a careful attempt should be made to observe it as closely 
as possible. 

Illustration 1, Fita trend line to the following data by the freehand method : 


Production of Production of 
steel steel 
^ar (in million Year (in million 

tonnes) tonnes) 

1967 20 1972 25 

1968 22 1973 23 

1969 24 1974 26 

1970 21 1975 25 

1971 23 


Solution : 


TREND BY THE FREE-HAND METHOD 


27—*—; т 
25 | ШЙ i 
| TREND LINE| 


PRODUCTION 
(шом, | 
TONNES) „| 
| TONNES, 23 


| 


* Tuttle : Elementary Business and Economic Statistics, p. 477. 
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The trend line drawn by the freehand method сап be extended to 
predict future values. However, since the frechand curve fitting is too 
subjective, this method should not be used аз a basis for predictions. 


Merits and Limitations 
Merits. 1. This is the simplest method of measuring trend. 


2. 'This method is very flexible in that it can be used regardless of 
whether the trend is a straight line or curve. 

3. The trend line drawn by a statistician experienced in computing 
trend and having knowledge of the economic history of the concern or 
the industry under analysis may be a better expression of the secular move- 
ment than a trend fitted by the use of a rigid mathematical formula which, 
while providing a good fit to the points, may have no other logical justifica- 
tion. In fact a specialist of long experience who is familiar with the institu- 
tional setting, history and behaviour of the series may well be able manually 
to'fit à trend superior to one derived by mathematical means. Although 
the freehand method is not recommended for beginners, it has considerable 
mierit in the hands of experienced statisticians and is widely used in 
applied situations. 

Limitations. 1. This method is highly subjective because the trend 
line depends on the personal judgment of the investigator and, therefore, 
different persons may draw different trend lines from the same set of data. 
Moreover, the work cannot be left to clerks and it must be handled by 
skilled and experienced people who are well conversant with the history of 
the particular concern. 

2. Since freehand curve fitting is subjective it cannot have much 
value if it is used as a basis for predictions. 

3. Although this method appears simple and direct, in actuality 
as experienced statisticians can verify it is very time-consuming to construct 
a freehand trend if a careful and conscientious job is done. 


It is only after long experience in trend fitting that a statistician 
should attempt to fit a trend line by inspection. 
2. Method of Semi-averages 

When this method is used, the given data is divided into two parts, 
preferably with the same number of years. For example, if we are given 
data from 1958 to 1975, ie. over a period of 18 years, the two equal 
parts will be first nine. years, i.e. from 1958 to 1966 and from 1967 to 
1975. In case of odd number of years like 9, 13, 17, etc., two equal parts 
can be made simply by omitting the middle year. For example, if data 
are given for 19 years from 1957 to 1975 the two equal parts would be 
from s to 1965 and from 1967 to 1975—the middle year 1966 will be 
omitted. 


After the data have been divided into two parts, an average (arith- 
metic mean) of each part is obtained. We thus get two points. Each 
point is plotted at the mid-point of the class interval covered by the 
respective part and then the two points are joined by a straight line which 
gives us the required trend line. The line can be extended downwards 
or upwards to get intermediate values or to predict future values. 


The following example shall illustrate the application of the method 2 
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Tllustration 2, Fit a trend line to the following data by the method of semi- 
averages : 


Year Sales of Firm A Year Sales of Firm A 
(Thousand Units) (Thousand Units) 

1969 102 1973 108 

1970 105 1974 116 

1971 114 1975 112 

1972 110 


Solution. Since seven years are given, the middle year shall be left out and an 
average of the first three years, and the last three years shall be obtained. The average 


102+105+114 321 
3 


of the first three years is mg =107 and the. average of the last three 


52 
years is 29:16733 = 336.12, Thus we get two points 107 and 112 which ѕпан 
be plotted corresponding to their respective middle years, i.e., 1970 and 1974. By joining 
these two points we shall obtain the required trend line. The line can be extended аай 
can be used either for prediction or for determining intermediate values. — 

The actual data and the trend line are shown on the following graph : 


TREND BY THE METHOD OF SEMI-AVERAGES 


Even Number of Years 

When there are even number of years like 6, 8, 10, etc., two equal 
parts can easily be formed and an average of each part obtained. How- 
ever, when the average is to be centered there would be some problem їп. 
case the number of years is 8, 12, еіс. For example, if the data relates 
to 1974, 1975, 1976 and 1977 which would be the middle year? In such, 
a case the average will be centered corresponding to Ist July 1975, i.e., | 
middle of 1975 and 1976. The following example shall illustrate the point 1 | 


Illustration 3. Fit a trend line by the method of semi-averages to the data given 
below. Estimate the population for 1976. If the actual figure for that year is 520 
million, account for the difference between the two figures. 


Year Pepulation Year Population 
(in million) (in million) 
1969 438 | rns JB 482) 1942 
1969 8 
1970 44|- 4 =437 1974 490 | 4 —4855 
1971 454 1975 500. 


Solution. The average of the first four years is 437 and that ofthe last four 
years 4855. These two points shall be taken corresponding to the middle periods, i.e., 
1st July 1969 and 1st Jely 1973. 


SM-E—9'77-31 
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Illustration 4. The sale of 
- December, 1977 in the following manner : 


à commodity in tonnes varied from January, 1977 to 


280 300 280 280 270 240 
230 230 220 200 210 200 
* Fit a trend line by the method ОЁ semi-averages. 
Solution ; CALCULATION OF TREND VALUE BY THE METHOD N 
OF SEMI-AVERAGES 
Se s E NS er en. os а ee, 
Month Sales in tonnes Month Sales in tonnes 
January 280 July 230 
February 300 | 1,650 (Total) August 230 
March 280 | of first six September 220 | 1,290 (Total) 
April 280| months October 200 | of last six 
May 270 November 210| months 
June 240 December 200 
» Average of the first half— 160 27; tonnes. 


Average of the second half = 9—25 tonnes. 


„These two figures, namely, 275 and 215, shall be plotted at the middle of their 
respective period, ie, at the middle of March-April and that of September-October, 
1976. By joining these two points we get a trend line which describes the given data. 
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Merits and Limitations 

Merits. 1. This method is simple to understand compared to the 
moving average method and the method of least squares. 

2. This isan objective method of measuring trend as everyone 
who applies the method is bound to get the same result (of course, leaving 
aside the arithmetical mistakes). Р 


Limitations. 1. Тһіѕ method assumes straight line relationship 
between the plotted points regardless of the fact whether that relationship 
exists or not. t 

2. Thelimitations of arithmetic average shall automatically apply. 
If there are extremes in either half or both halves of the series, then the 
trend line will not give a true picture of the growth factor. This danger is 
greatest when the time period represented by the average is small. Conse- 
quently, trend values obtained are not precise enough for the purpose 
«нг of forecasting the future trend or of eliminating trend from original 

ata. 
E For the above reasons if the arithmetic averages of the data. are to 
be used in estimating the secular movement, it is sometimes better to 
use moving averages than semi-averages. 


3. Method of Moving Averages* 

When a trend is to be determined by the method of moving ave- 
rages, the average value for a number of years (or months or weeks) is 
secured, and this average is taken as the normal or trend value for the 
unit of time falling at the middle of the period covered in the calculation 
of the average. The effect of averaging is to give a smoother curve, less- 
ening the influence of the fluctuations that pull the annual figures away 
from the general trend. - 

While applying this method, it is necessary to select a period for 
moving average such as 3-yearly moving average, 5-yearly moving ave- 
rage, 8-yearly moving average, etc. The period of moving average is 
to be decided in the light of the length of the cycle. Since the moving 
average method is most commonly applied to data which are characte- 
rised by cyclical movements, it is necessary to select a period for mov- 

* This method is not restricted to determining trend, it can also be usedin 
connection with seasonal variations, cyclical variations and irregular variations. 

LJ 
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ing average which coincides with the length of the cycle, otherwise the 
cycle will not be entirely removed. This danger is more severe, the shorter 
the time period represented by the average. When the period of mov- 
ing average and the period of the cycle do not coincide the moving average 
will display a cycle which has the same Period as the cycle in the data, 
but which has less amplitude than the cycle in the data. Often we find 
that the cycles in the data are not of uniform length. In such a case we 
should take a moving average period equal to or somewhat greater than 
the average period of the cycle in the data. Ordinarily the necessary 
period will range between three and ten years for general business series 
but even longer periods are required for certain types of data. 
The formula for 3-yearly moving average will be : 


а+ь+с btet+d c+d+e d--e--f 
E ee Es $ TM eme 


and for 5-yearly moving average 
a+b+c+d+e b+e+d+e+f c+d+e+f+g 
$ и s d maru E UE 


Illustration 5. Calculate trend values by 3-yearly moving average from the 
following data : 


Sales Sales 
Year (Thousand Year (Thousand 

Units) Units) 
1960 5 1968 13 
1961 ў 1969 17 
1962 9 1970 19 
1963 12 1971 14 
1964 1 1972 13 
1965 10 1973 12 
1966 8 1974 15 
1967 12 


(B.A. Hons, Econ., Kurukshetra, 1975) 
Solution, CALCULATION OF TREND VALUES 


Year Sales 3-Yearly 3-Yearly 
(Thousand Units) Totals Moving average 

e | x > 

1 7 21 700 
1962 9 28 9°33 
1963 12 32 10°67 
1964 11 33 11°00 
1965 10 29 9°67 
1966 8 30 10°00 
1967 12 33 11°00 
1968 13 42 14°00 
1969 17 49 16°33 
1970 19 50 16°67 
1971 14 46 15°33 
[t Н i Ee 
1974 15 2 1933 


Illustration 6, Calculate 5-yearly and 7. 


-yearly average fi i 
of number of commercial industrial failures in pec бетелке daia 


а country during 1959-1974 : 
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Year No. of failures Year No. of failures 
1959 23 1967 9 
1960 26 1968 13 
1961 28 1969 11 
1962 32 1970 14 
1963 20 1971 12 
1964 12 1972 9 
1965 12 1973 3 
1966 10 1974 1 


Also plot the actual and trend values on a graph. 
Solution: CALCULATION OF 5-YEARLY AND 7-YEARLY MOVING AVERAGB 


Year No. of 5-yearly 5-yearly mov- 7-yearly 7-yearly moving 
fallures moving ing average moving average 
totals totals 
1959 23 v = E = 
1960 26 =a = = — 
1961 28 129 258 D uM 
1962 32 118 236 153 21°9 
1963 20 104 20°8 140 20'0 
1964 12 86 172 123 176 
1965 12 63 126 108 154 
1966 10 56 1r2 87 124 
1967 9 55 1ro 81 1r'6 
1968 13 57 1r4 81 11°6 
1969 11 59 1r8 78 1ri 
1970 14 59 11°8 71 101 
1971 12 49 98 63 90 
1972 9 39 79 = = 
1973 3 = = E aS 
1974 1 = == = T 


t 


]—^4cuat DATA 


Е ~ -5 YEARLY MOVING AVERAGE, 
| 7» ^ ” 


о 
1989 1960 1961 1962 1963 1964 1955 1966 1967 1968 1969 1970 1971 1972 1973 1974 
YEARS 


Even Period and Moving Average 

If the moving average is an even period moving average, say, four- 
yearly or six-yearly, the moving total and moving average which are 
placed at the centre of the time span from which they are computed fall 
between two time periods. This placement is inconvenient: since the mov- 
ing average so placed would not coincide with an original time period. 
We, therefore, synchronise moving averages and original data. This pro- 
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cess is called centering* and always consists of taking a two-period moving 
average of the moving averages. 
Illustration 7. Work out the ‘centered 4-yearly moving average’ for the following 


data : 
Year Tonnage of cargo Year Tonnage of cargo 
cleared cleared 
1964 1102 1970 1452 
1965 1250 1971 1549 
1966 1180 1972 1586 
1967 1340 1973 1476 
1968 1212 1974 1624 
1969 1317 1975 1586 


Solution: CALCULATION OF THE CENTERED FOUR-YEARLY 
MOVING AVERAGE 


Year Tonnage of 4-yearly 4-yearly 4-yearly centered 
cargo cleared moving totals moving average moving average 
I o> -— ^ - d i 
1964 1102 — — = 
1965 1250 = Ех = 
< 4872 121800 
1966 1180 <- 123175 
<= 4982 1245750 
1967 1340 <= 1253°87 
+ 5049 1262725 
1968 1212 + 129625 
<_ 5321 1330°25 
1969 1317 + 1356`37 
< 5530 1382'50 
1970 1452 + 1429'25 
<| 5904 147600 
1971 1549 +- 149587 
+ 6063 151575 
1972 1586 < 153725 
Є 6235 1558`75 
1973 1476 1563°37 
; + 6272 1568'00 
1974 1624 — — ыы 
1975 1586 — — as 


| Illustration 8. Assume a four-yearly cycle, calculate the trend by the method of 
moving averages from the following data relating to the production of tea in India : 


Year Production Year Production 
N (mn. lb.) (mn. lb.) 
1966 464 1971 
ә 1967 515 1972 250 
1968 518 1973 571 
1969 467 1974 586 
1970. 502 1975 612 


К * There is another. method of centering the moving averages. If we are calculat- 
ing 4-yearly moving average we will first take four-yearly totals and of these totals we 
will again take 2-yearly totals and divide these totals by 8. 
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Solution: CALCULATION-OF TREND BY THE MOVING AVERAGE METHOD 
pst at m ne, S 
Year Production 4-yearly moving 4-yearly 4-yearly moving 
(mn. 1b.) totals average average centered 
1966 464 T e = — 
1967 515 = — E — 
s 1964 49r0 
1968 518 bad 495775 
< 2002 500'50 
1969 467 < 503'62 
= 2027 50675 
1970 502 <- 511:62 
<= 2066 51650 
1971 540 = 52950 
<] 2170 542750 
1972 557 + 553°00 
< 2254 56350 н 
1973 57 же 572750 
cm 2326 581'50 а 
1974 586 = — — = = 
1975 612 а — — = 


Year 
Cyclical 
fluctuations 
Year 
Cyclical 
fluctuations 


Solution : 


1960 


+2 
1969 


—1 


1961 1962 

0 
1970 1971 
T2 +1 


1963 


“2 
1972 


0 


1965 1966 1967 1968 
+2 +1 0 —2 
1974 


=] 


CALCULATION OF THREE-YEARLY, FIVE-YEARLY 


AND SEVEN-YEARLY MOVING AVERAGES 


Cyclical 


3-yearly moving 


+100 
—0`33 


averages 


5-yearly moving 
averages 


7-yearly moving 
averages 


0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
ө 
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TREND BY THE METHOD OF MOVING AVERAGES 


Merits and Limit tions 


Merits. 1. This method is simple as zompared to the method of 
least squares. 

2. It is a flexible method of measuring trend for the reason that if 
a few more figures аге added to the data, the entire calculations are not 
changed —we only get some more trend values. 

3. Ifthe period of moving average happens to coincide with the 
period of cyclical fluctuations in the data, such fluctuations are automati- 
cally eliminated. 


4. The moving average has the advantage that it follows the general 
movements of the data and that its shape is determined by the data rather 
than the statistician's choice of a mathematical function. 

5. Itis particularly effective if the trend of a series is very irregular. 


Limitations. 1. Trend values cannot be computed for all the years. 
| The longer the period of moving average, the greater the number of years 
{ for which trend values cannot be obtained. For example, in a three-yearly 
| moving average, trend value cannot be obtained for the first year and last 
| year, їп а five-yearly moving average for the first two years and the last 
(two years, and so on. It is often these extreme years in which we are 

most interested. 

2. Great care has to be exercised in selecting the period of moving 

average. No hard and fast rules are available for the choice of the period 
and one has to use his own judgment. 


3. Sincethe moving average is not represented by a mathematical 
function this method cannot be used in forecasting which is one of the 
main objectives of trend analysis. 


4. Although theoretically we say that if the period of moving 
average happens to coincide with the period of cycle, the cyclical fluctua- 
tions are completely eliminated, but in practice since the cycles are by 
no means perfectly periodic, the lengths of the various cycles in any given 


series will usually vary considerably and, therefore, no moving average . 


can completely remove the cycle. The best results. would be obtained by 
& moving average whose period was equal to the average length of all the 


M c7 ешт" 
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cycles in the given series. However, it is difficult to detemine the average 
length of the cycle until the cycles are isolated from the series. 

5. Finally, when the trend situation is not linear (a straight line) the 
moving average lies either above or below the true sweep of the data. 
Consequently, the moving average is appropriate for trend computations 
only when : 

(a) the purpose of investigation does not call for current analysis or 
forecasting, 

(b) the trend is linear, and 

(c) the cyclical variations are regular both in period and amplitudes. 

Unfortunately, these conditions are encountered very infrequently. 

4. The Method of Least Squares 

This method is most widely used in practice. It is a mathematical 
method and with its help a trend line is fitted to the data in such a manner 
that the following two conditions are satisfied : 

(1) ZXY—Yy)-0 
ie. the sum of deviations of the actual values of Y and the computed 
values of Y is zero, 

(2) Х(У— Ү,)? is least. 

i.e., the sum of the squares of the deviations of the actual and computed 
values is least from this line. That is why this method is called the 
method of least squares. The line obtained by this method is known “аз 
the line of best fit’. 

The method of least squares may be used either to fit a straight line 
trend or a parabolic trend, 

The straight line trend is represented by the equation 

Yo=a+bX 
where Y, is used to designate the trend values to distinguish them from 
the actual Y values, a is the Y intercept or the value of the Y variable 
when Y—0. b represents the slope of the line or the amount of change in 
Y variable that is associated with a change of one unit in X variable. The 
X variable in time series analysis represents time. 1 

Whenever we fit any straight line trend by the least squares method, 
three things should be specified : 

(1) Which year was selected as the origin ? 

(2) What is the unit of time represented by X? Is it half year, one 
year or five years ? ~ 

(3) In what kind of units is Y being measured ? Is it production in 
tonnes, sales in rupees, prices in rupees, employment in thousands of 
workers ? 

In order to determine the values of the constants a and b the 
following two normal equations are to be solved : 


YY=Na+b3X 4 NO) 

ZXY-aXX-d-bXX* (ii) 

where N represents number of years (months or any other period) for 
which data are given. 


ә 
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It should be noted that the first equation is nearly the summation of 
the given function, the second is the summation of X multiplied into the 
given function. 

We-can measure the variable Y from any point of time in origin 
such as the first year. But the calculations are very much simplified when 
the mid-point in time is taken as the origin because in that case the 
negative values in the first half of the series balance out the positive values 
in the second half so that XY: —0. In other words, the time variable is 
Measured as a deviation from its mean. Since 2X¥=0 the above two 
normal equations would take the form 


н ZY—Na 
УХҮү=рУХ? 0) 
The values of а and b can now be determined easily. 
Y. Since БАр to he ant Р 
i ZYY 
Since 2XY=b3X2 o b= D 


. .. The constant ‘a’ gives the arithmetic mean of Y and the constant ‘b’ 
indicates the rate of change. 


. It should be noted that in case of odd number of years, when the 
deviations are taken from the middle year Xy would always be zero 
years also 2X will be zero if the Y origin is placed midway between 
the two middle years. For example, if the years are 1969, 1970, 1971, 
1972, 1973, 1974, we сап take deviations from the middle year 1961°5. 
Thus the deviations would be —2'5,—1:5,—0:5, 405,415,425 for the 
Various years and the total £X would be zero. Hence both in odd as well 
as In even number of years we can use the simple procedure of determining 
the values of the constant a and b. 


Illustration 10, Below are given t f ion (i S 
Шуган € Siven the figures of production (in thousand maunds) 


Year 1961 1962 1963 1964 1965 1966 1967 
Productioh 80 0 9 
(in *000 mds.) ў A S n d e 


(i) Fit a straight line trend to these figures, 
(ii) Plot these figures on a graph and show the trend line. 
(B. Com., Mysore 1968 ; B. Com., Andhra, 1972) 


Solution: (г) FITTING THE STRAIGHT LINE TREND 


Production 

Year ( vu mds.) x ХЕ хз Trend values 
Y, 

1961 80 —3 —240 

1962 90 t -—2 —180 4 84 

1963 ҮЙ г: —1 — 92 1 88 

1964 . 83 Р 0 0 0 90 

1965 94 1 94 i 92 

1966 99 2 198 4 94 

1967 92 3 276 9 96 


d DUM XO „1 
№7 ZY-630 Xx-0  ZIxy-se ®Х?=28 ®Ү,=630 
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The equation of the straight line trend is'Y,—a-- 5X 
f EY ZxY 
Since ZX-O0 ; а › by 
Hence ZY—630, N=7, ZXY—56, ZX*=28. 
ae а= 89.90; апі b= 56 =2 
Hence the equation of the straight line trend is Y,—9042X 
Origin, 1964 ; X units, one year ; Y units, production in thousand mds, 
For Х=—3, Y,—90--2(—3) =84 м 
For Х=—2, Ү,=90+2(—2)=86 
For X=—1, Y,—904-2(—1) —88. 
Similarly, by putting X=0, 1, 2, 3 we can obtain other trend values. However, 


since the value of b is constant, first trend value need be obtained and then if the value 
bis positive We may continue adding the value of bto every preceding value. For 
example, in the above case for 1961 the calculated value of Yis84. For 1962it will be 
84--2—80 ; for 1963 it will be 86--2—88, and so on. If b is negative then instead оў 
adding we will deduct. М 


(ii) The graph of the above data is given below : 


LINEAR TREND BY THE METHOD OF LEAST SQUARES 


“ 


| HUS ba =o 
aaa eee : 
| E 
0 
198 — I$ 178 7 RH #З 1974 


YEARS 


283-2; 
If instead of middle year as origin we take first year as origin the solution would 


* be as follows ; i 


Year Production 

(000 mds.) X XY. x Y, 

T 
1961 80 0 0 0 84 
1962 90 1 90 1 86 
1963 92 2 184 4 88 
1964 83 3 249 9 90 
1965 94 4 376 16 92 
1966 99 5 495, 25 94 
1967 92 Ў 6 552 36 96 
=7 XY—630 EX-21 ZXY—1,946 ®Хз=91 ХҮ,=630 


Y,=a+bX ; XY-Na-tb3X; XXY-aZX4bXX* 
Substituting the value 630=7a+2b ^ s) 
1946=2la+91b vil). 
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Multiplying Eqn. (i) by 3 : 1890—21a4-635 
1946=21a+91b 


—28b=—56 or b=2 
Substituting the value of b in Бап. (i) 630=7a+21(2) 


7a=630—42=588 or a—84 
Thus the equation is Y,—844-2X et 
Origin 1961 ; X units, one year ; Y units, production in thousand mds. 
For X=0, Y—84 ы „Ам % 
Note: The difference in the two equations is because of the difference in origin, 
In the first case 1964 was taken as origin Whereas in the second case 1961 was taken as 
origin. However, trend values are the same. и М $ z 
Шизїгайоп 11. Eliminate the trend from the following time series (assuming a 
linear trend) 
Year 1965 1966 1967 pa д dr Mt 
No. of production units 125 128 133 1 
ү (В.А. Hons, Econ, Delhi, 1975) 
Solution. Elimination of Trend 


Year Мо, of Deviations 
production units from 1968 
Y X 


XY Y, (-Y) | 
1965 125 E 9 —375 125-679 —0'679 
1966 128 —2 4 —256 128786 —0°786 
1967 133 =i 1 —133 131893 +1117 
1968 135 0 0 0 135-000 0 
1969 140 +1 1 +140 138107 +1893 
1970 141 +2 4 +282 141214 —0214 
1971 143 43 9 2429 144321 —1321 
N27 УүҮ—ә945 2X=0 УХ'=% IXY-87 — XY,—945 
Y-aXbX; а= ЗҮ _ 945 ss 


b= Sit = oR =3'107 ; y-135--3107 X 
When Х=—3, Y will be 
Y—135-43107(—3)-125679 * 

, Illustration 12, Fita straight line trend by the method of least squares to the 
following data. Assuming that the same rate of change continues, what would be the 
predicted earnings for the year 1972? 

Year. я 1963 1964 1965 1966 1967 1968 1969 1970 
Earnings (Rs, іп lakhs) 38 40 65 72 69 60 87 95 
Do not plot the trend values on the graph. 
(B. Com., Bombay, 1971 ; B, Com., Delhi, 1972 ; B, Com., Punjab, 1974) 
Solution, FITTING OF STRAIGHT LINE TREND BY THE METHOD 


ОЕ LEAST SQUARES 
Earnings Deviations Deviations 

Year (Lakhs) from 19665 multiplied by 2 

yY X XY x 
1963 38 Eq = —266 49 
1964 40 =25 =s —200 25 
1965 65 fI =3 —195 9 
1966 72 —0'5 =1 A 1 
1967 69 +05 +1 + 69 1 
1968 60 ED T3 +180 9 
1969 87 +25 HS) +435 25 
1970 95 +35 +7 +665 49 
N-8 — XY-53 ZX-0 SAY ney 
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Y 52 

Y,-a4bX ; а= DE 326 6595 i 
> е Жүл СНА > 
b=- ys — 168 =3'667 ; Y—6575--3 667 X. 

Fer 1972 X will be +11, 

When X is +11, Y will be 

ү=65'75--3'667 (11) —65754-40:337 —106:09. 

Thus the estimated earnings for the year 1972 are Rs. 106709 lakhs. 

The same result will be obtained if we do not multiply the deviations by 2. 
But in that case our computations would be more difficult as would be seen below: 


Sales in Deviation 
Year thousands of from 
rupees 19665 
Y Xx XY x 
1963 38 13:5 —133'00 1225 
1964 40 —25 —100°00 625 
1965 65 an — 97'50 225 
1966 72 —0°5 — 36°00 025 
1967 69 +0°5 + 34°50 0:25 
1968 60 FPES + 90°00 225 
1969 87 +25 +217.50 625 
1970 95 T35 733250 1225 
=8 2Y=526 УХ=0 УХҮ=308 $X:—4200 
ZY 526 MTS. DAY (1308) —. 
NS 6575; b ir = 42 733 


The advantage of this method is that the value of b gives annual increment of 
change rather than 6-monthly increment, as in the first method discussed above. Hence 
we will not have to double the value of b to obtain yearly increment. It is clear from 
the above illustration that in the first case the value of b is half of what we obtain from 
the second method (5 was 3°67 in the first case and 7:333 in the second case). м 


Illustration 13. Below are given the figures of production (in thousand 
maunds) of a sugar factory. 


Year Productien Year Production 
(thousand maunds) (thousand maunds) 

1963 TI 1968 91 

1965 88 1969 98 

1966 94 1972 90 B 

1967 85 

(i) Fita straight line by the ‘least squares’ method, and tabulate the trend 
values. 

(ii) Eliminate the trend. What components of the time series are thus left 
over ? 


iii) What is the monthly increase in the production of sugar ? 
ny ER (M. Com., Delhi, 1972) 
Solution, (i) The equation of the straight line trend is ¥,=a+bX 
Since EX is not zero the values of a and b will be obtained directly by selving 
the following two normal equations : | 
XY-NatbXX sd 
ZXY-aXX45zx* (йу) 


К, 
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Year Production Taking 1967 XY ХЖ Trend values (Y—Y,) 


as origin 
Y a Ye 
1965 . m -4 -308 16 83283 —6283 
1965 88 —2 —176 4 86`043 +1957 
1966 94 -1 — 94 1 87423 6577 
1967 # 85 0 0 0 88803 —3'803 + 
1968 91 +1 + 91 1 90:183 T0817 * 
* 1969 98 H2 +196 4 91563 +6437 
1972 * 90 +5 +450 25 957703 22703 — 
-7  2Y=63  IX-| ZXY-4159 XX-5 ХҮ,=623 X(Y-Y)-0 
ua 623=7a+b AF ...(у) 
159=a+51b ^ ee (ü) 
» Multiplying the second equation by 7, we get o v 
a Vey 1113=7a+357b 0 
Deducting equation (iii) from (i) 
490 i: 
—490=— 356b or b= 356 =1°38 
Substituting the value of 5 in the equation (i) 
" 623—7a4-138 
s 7a=623—1°38=621'62 or а=88`805. 


So the équation of straight line trend is 
ы Y—88803--1:38X 
when X——4, Y—88:803--1:38(—4) —88:803—5:52—83283 


when X=—2, Y—88:803--038(—2) —88803—276—86:043 
when Х=—1, Y=88'803—1°38=87'423 

when X=—0, Y—88803 

‘when X=1, Y-888034-1:38—90:183 

when X=+2, Y=88'803+ (1°38 x2) —917563 

‘when X=+5, Y=88'803+ (1°38 х 5)=95°703 


(ii) After eliminating the trend we are left with cyclical and irregular valens; 
(iii) 'The monthly increase in the production of sugar is 

^ 6/12, i.e., 1°38/12=0°115 thousand mds. 

Merits and Limitations of the Method of Least Squares 


Merits. 1. This is a mathematical method of measuring trend and 
as such there is no possibility of subjectiveness. 

2. The line obtained by this method is called the line of best fit 
because it is this line from where the sum of the positive and negative 
deviations is zero and the sum of the Squares of the deviations least, ie, 
Z(Y— Y;)—0 and X(Y— Y; least. 

Limitations 

менен curves are useful to describe the general movement 
of a time series, but it is doubtful whether any analytical significance 
should be attached to them, except in special cases. It is seldom possible 
to justify on theoretical grounds any real dependence of a variable. on 
the passage of time. Variables do change in a more or less systematic 
manner over time, but this can usually be attributed to the operation of 
other explanatory variables. Thus many economic time series show per- 
sistent upward trends over time due to a growth of population or to a 
general rise in prices, i.e., national income and the trend element can to 

a considerable extent be eliminated by expressing these series per capita 
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or in terms. of “constant purchasing ,power. For these reasons mathe- 
matical trends are generally best regarded as tools for describing movements 
in time series rather than as theories of the causes of such movements. 
It follows that it is extremely dangerous to use trends to forecast future 
movements of a time series. Such forecasting, involving as it does extra- 
polation, can be valid only if there is theoretical justification for the par- 
ticular trend as an expression of a functional relationship between the 
Variable under consideration and the time. But if the trend is purely 
descriptive of past behaviour, it can give few clues about future behaviour. 
Often the extrapolation of a trend gives ridiculous results which themselves 
are prima facie evidence that the trend could not be maintained. 

Hence, mathematical methods of fitting trend are not foolproof—in* 
fact, they can be the source of some of the most serious errors that are 
made im statistical work. They should never be used unless rigidly con- 
trolled by a separate logical analysis. Trend fitting depends, upon the 
judgment of the statistician, and a skilfully made freehand sketch is often 
more practical than a refined mathematical formula*. 

Second Degree Parabola А 

The simplest example of the non-linear trend is the second degree. 

parabola, the equation of which is written in the form : * 
Ү,=а-+ЬХ-ЕсХ* : 

When numerical values for a, b and c have been derived, the'trend value 
for any year may be computed by substituting in the equation the value 
of X for that year. The values of a, b and c can be determined by solving 
the following three normal equations simultaneously : 

(i) ZY-—Na--bX--cXX? 

(ii) ZXY-aZX--bYX?--cXX? 

(й) XX?Y-—aXX?-bZXX?--cXX'. " 

Note that the first equation is merely the summation of the given” 
functioh, the second is the summation of X multiplied into the given func- 
tion, and the third is the summation of X? multiplied into the given 
function. ^ 

When time origin is taken between two middle years ХХ would be 

' zero. In that case the above equations are reduced to : 

(ї) XY—Na--cXY? 

(ii) ZXY-bzxX? 

(її) XX'Y-—aXX'4-cXX*. 
ы The value of b can now directly be obtained from equation (ii) and, 

that of a and c by solving (i) and (iii) simultaneously. Thus 
(ү) (Ж). , 2XY . |. NGXY)—200) XY) 
E N : oxen MJ- 

Illustration 14. The prices of a commodity during 1970-75 are given below, 
ЕЦ а Sar Y=a+bX+cX* to these data. Estimate the price of the commodity for 
the year . 


Year Prices Year Prices 
m 1970 100 1973 140 
1971 107 1974 181 
1972 128 1975 192 & 
Also plot the actual and trend values on graph. (B. Com., Gujarat, 1975) 


ж Rigsleman. and Frisbee : Business Statistics, p. 312. ы 


» 1 


» 
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Solution. To determine the value of a, b and c We solve the following normal 


DY=Na+b3X+cEX?; 
&£XY-aZX4bEXT-cXX? 
ZX!Y-aXXbXYtM.xx 


Y 

Year Prices x x T x XY XY Trend Values 
1970 100 =2 4 —8 16 —200 400 88'82 
1971 -107 шк! 1 =! 1 —107 107 105'07 
1972 128 0 0 0 0 0 0 12488 
1973 140 1 1 1 1 140 149 14827 
1974 181 2 4 8 16 362 724 175°22 
1975 192 3 9 27 81 576 1728 20575 
=6 2Y=848 DX=3 X уз УХ XXY zx» 848-01 

=19 =27 =115 =771 =3099 

: 848— 6a+3b+19¢ x) 

771= 3a4-1954-27c « (ii) 

a 3,099—19a--275--115c - (ili) 


when X=—2 


Multiplying the second equation by 2 and keeping the first as it is, we get 


848= 6a+3b+19c 
1542— 6a--38b--54c 


—694— —35b—35c 
35b--35c—694 
Multiplying Eqn, (її) by 19 and Бап, (iii) by 3, we get 
14649—57a--36154-513c 
9297=57a+81b+345c 


* 5352—28054-168c 
Multiplying equation (iv) by 8, we have 2804-280c—5552 
Solving equation (v) and (vi), 
Е 2805--280c—5552 
280b--168c—5352 


112c—200 or с=1"786 
Substituting the value of c in egn. (iv) 
35b+ (35 x 1°786) —694 


i 35b —694--62/5—756'5 or b=21'6, 
Putting the values of b and c in equation (i), 


848=6a+3(21'°6) +19(1°786) =6a-+64'8+33°93 


6a—848—98'73—749'27 or a—124'88 
Thus 4—12488, b—21'6 and c—1'786 


4 Substituting the values in the equation, 


Y—124:88--21'6X--1786X? 
Y—124884-21:6(—2)-1:786(—2)* 


=124'88—43'2+-7°144=88'824 or 8882 


when Y——1 


LI 


when X—1, 


Y—12488--21:6(—1)4-1786(—1)* 


—12488—21'6--1786—105'066 or 10507 
when ¥=0, Y—12488 


¥=12488-+21°6+1'786=148'266 or 14827 


Gr) 


to» 
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when X—2, * ¥=124°88+21°6(2)+1°786(2)*=124'88+-43'24+-7'144—=175'22 
when X=3, Y=124'88-+21°6(3)4-1°786(3)®=124'88-+-64'8+-16074—=205'75, 
Price for 1976 : 


For 1976 X would be equal to4. Putting X—4 in the equation, 
ү=124'88+-21'6Х-4+1'786Х2=124:88-+21'6(4)4-786(4)2 
—124'88-4-864--28576—239'86. 

Thus the price of the commodity for the year 1976 is Rs. 23986. 

The graph of the actual and trend values is given below : 


SECOND DEGREE PARABOLA 
(yea * bx» cx?) 


MEASURING TRENDS BY LOGARITHMS 

The trends discussed so far were plotted on arithmetic scales. 
Trends may also be plotted on a semi-logarithmic (or semi-log) chart in 
the form of a straight line or a non-linear curve. A straight line on the 
semi-log chart shows the increase of Y values of a time series at a constant 
rate. (A straight line on an arithmetic chart indicates the increase at 
à constant amount). When it is a non-linear curve on the semi-log chart 
ап upward curve shows the increase at varying rates, depending on the 
Shapes ofthe slopes—the steeper the slope, the higher is the rate of- 
increase. 

The types of trend usually computed by logarithms are : 

1. Exponential trends. 

2. Growth curves. 


Exponential Trends 
The equation of the exponential curve is of the form 


y—ab* 
Putting the equation in logarithmic form, we get | 
log Y—log a+ X log b. n 

When plotted on a semi-logarithmic graph, the curve gives a straight 
line. However, on an arithmetic chart the curve gives a non-linear trend. 
In order to find out the values of a and 5, the two normal equations to be 
solved are : a 

X logY=N log a+log bZX 

X(X.logY)—log a ZX--log b ХХ?, 
SM-E—9 77.32 | 
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When deviations ate taken from middle year, i.e., Zx—0, the above 
equation takes the following form : 


x 
ZlogY-Nloga ~. log C Y 
and X(x. log Y)-logb Sx? .. log сар 


Steps. Тһе steps in fitting such a curve аге: 

l. Find the time deviation of each year from the middle year and 
denote these deviations by x. 

2. Square these deviations and obtain £x’. 

3. Obtain logarithms of the variable Y. 

4. Multiplylog Y by the corresponding time deviation and obtain 
xlog Y. . 

5. Divide ZlogY by N. This would give the value of log a. 

6. Divide X(xlog Y) by Xx*. This would give the value of log b, 
i.e, rate of growth or the slope of the line. 

7. Put the value of log a before the middle year and add or subtract 
the slope of the line, i.e., the value of log b to get trend ordinates in 
logarithms. 

8. Takethe antilogs of these logs to arrive at the actual trend 
values. 


IHustration 15, The sales of a company in thousands of rupees for the years 
1965 through 1971 are given below : 
Years: 1965 1966 vd 1968 1969 1970 1971 


Sales : 32 . 47 92 132 190 275 
Estimate sales figure for the year 1972 using an equation of the firm У=аб® where х= 
years and Y —sales. (M. Com., Delhi, 1974) 
Solution, FITTING EQUATION OF THE FORM Y=ab* 
Year Ss x Log Y x ——— xLoY 
1965 32 E 175051 9 —4/5153 
1966 47 —2 1'6721 4 —33442 
1967 65 c T8129 1 —18129 
1968 92 0 1'9638 0 0 
1969 132 +1 2`1206 1 +21206 
1970 190 +2 272188 4 +45576 
1971 275 +3 24393 9 T3179 
Ух=0 BLog Y — 2:0—28 Zx.Log Y 
—137926 =+4°3237 


2x.Log Y 


XLogY 157926 |. T 
4L 9704; Log Uc = 43237 узд 


Log a= N 
Log Y—19704d-154x 
For 1972 x would be +4. When x=4, Log Y will be 
Log Y=1'9704+"154 (4) 2275864 
Y=AL 2:5864—385'9 
Thus the estimated sales for the year 1972 is 385'9 thousand rupees. 
Second Degree Curves Fitted to Logarithms 


We may come across data which when plotted on semi-logarithmic 
paper may continue to show curvature, being concave either upward or 
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downward, ог, іп other words, the ratio of. change may be either increasing 
or decreasing. In such cases, we may fit second-degree curve to the 
logarithms of the Y values using 
log Y=log a+X log 5-- X? logc 

Taking the X origin at the middle of the period, the three normal 

equations are : 
(i) Х log Y=N log a+log сўх 

(ii) (х. Іов Y)—log b Хх? 

(iii) X(x*log Y)=log а Zx?--log cXx*. 
2. Growth Curves 

In economic data very often we come across phenomenon where at 
first the growth is very slow, but as the product is accepted the demand 
increases by a greater amount each year and finally as the market becomes 
more and more fully developed, the amount of growth each year becomes 
less. The curve continues to grow more and more slowly, approaching an 
upper limit but not reaching it. Such series are best represented by growth 
curves. The growth curves do not reach а maximum and turn down in 
the manner of the second degree parabola. 

A number of different growth curves have been used to measure 
secular trend, but the curves used most widely to describe growth are the 
Gompertz Curve and the Peart Reed or Logistic Curve. 


The equation of Gompertz curve is 


X 
Y—ka b 
which when put to logarithmic form becomes 
log Y=log K+ (log а)ЬХ 
The Gompertz curve serves to describe the growth of series which while 


increasing seem to approach some maximum valueas a limit. Although 
the growth continues it does so at a decreasing rate. 


The equation of the logistic curve is 
+ =K+ab* 

The logistic curve has been applied widely to population data of 
various kinds, both human and non-human, and it has also provided a 
good fit to many economic series pertaining to industrial growth. 

Ifon the left-hand side of this equation we write Y instead of 


I , We get Y=K+ab* which is the equation of another curve used in 


fitting trends, the so-called modified exponential. Since the fitting of growth 
cutves involves a considerable amount of mathematical detail we shall not 
go into it here. 
Selecting the Type of Trend 

We have given above différent ways of fitting trends. However, it is 
not all—some other equations might also be reasonably used. Even 


though each series presents its own individual problem most series can be 
handled by the methods which we have described. Of course, what we 
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try to do in any particular case is.to select that equation or that method 
of measuring trend which best describes the gradual and consistent pattern 
of growth. Р 

- The choice of a particular type of equation that best describes the 
data is often difficult and needs considerable amount of judgment and 
experience. 

While deciding the type of trend, the first step consists of plotting 
the data on arithmetic paper. If the trend is not linear but either : 

(a) upward and concave upward, or 

(b) downward and concave upward 
the data should be plotted on a semi-logarithmic paper. Examination 
of the plotted data often provides an adequate basis for deciding upon the 
type of trend to use. When further guidance is needed an approximate 
trend may be drawn by inspection and the following tests applied to the 
smoothed curve* : 

1. If the first differences tend to be constant, use а straight line. 

2. If the second differences tend to be constant, use a second 
degree curve. 

3. If the approximate trend when plotted on arithmetic paper is à 
straight line, use a straight line. 

4. If the first differences of the logarithms are constant, use аһ 
exponential curve. 

5. И the second differences of the logarithms are constant, fit a 
second degree curve to the logarithms. 

6. Ifthe first differences tend to decrease by a constant percentage, 
use a modified exponential. 

.. 7. If the first differences resemble a skewed frequency curve, use а 
Gompertz curve or à more complex logistic curve. 
Choice of the Trend Period 

In order to simplify the discussion of trend computation, the 
illustrations in this chapter are based on 7-8 years' data only. However, 
wherever possible, the period should be longer. The longer the period, 
the less the trend values will be distorted by cyclical or random influences. 
The period employed should encompass а number of business cycles and 
should begin and end in such a way that distortion is avoided. This 
purpose can be accomplished by using a period that starts and finishes 
either in prosperity or depression, or by beginning during recovery and 
ending during recession. 

Conversion of Annual Trend Values to Monthly Values _ 

Usually for trend computations annual figures are employed. How- 
ever, it is sometimes required to obtain monthly trend ordinates. In 
converting straight line trends from an annual to a monthly basis, two 
situations must be clearly distinguished. For series such as sales, produc- 
tion or earnings, the annual figure is the total of monthly figures. Here 
it is neeessary to divide both a and b by, 12 to reduce them to monthly 
levels. In other words, on the average, monthly sales or production is 
one-twelfth of the annual total. The b values must then be divided by 12 
once again in order to convert from annual to monthly increments. 


* Croxton and Codwen : Applied General Staistics. 
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The necessity of dividing b twice by 12, that is by 144 altogether, 
must be clearly understood. The division of annual change by 12 gives us 
only the change from same month іп a given year to the corresponding 
month in the following year, or the annual change in monthly magnitudes. 
However, we are here seeking for an expression of the change in each and 
every month, that is, monthly change in monthly magnitude. Thus, b 
must be divided again by 12. 

In conclusion, to convert ап annual trend equation to a monthly 
basis when the original data are given as totals, a is divided by 12 and 5 
is divided by 144. 

If X in the trend line equation represent only 6 months, it is divided 
by 72 instead of 144. 

Illustration 16, Convert the following annual trend equation for tea production 
in India to a monthly trend equation: 
Y=108+1°58X 
(Origin, 1966 ; time unit, 1 year ; Y=tea production in million kg.) " 
Solution, Monthly trend equation will be obtained by dividing а by 12 and b 
by 144. Thus the monthly trend equation will be 
y= 108. 5 158 o pony 
12 144 
(Origin July 1, 1966 ; time unit 1 month, Y—monthly production in million kg.) 

Where data are given as monthly averages per year, the value of 
the constant ‘a’ in the annual trend equation is the arithmetic mean 
ofthe twelve-month total. In other words. it is already at the 
monthly level. The value of *° now represents the annual change in 
monthly magnitude. Asa result, to convert an annual trend»equation 
when annual data are expressed as monthly averages, а would remain 
unchanged and b is divided by 12 only. 


Shifting the Trend Origin = 
In computing trends, the middle of the time series is often used 

as the origin in order to short-cut the computations. But very often 
we need to change the origin of the trend equation to some other point 
in the series. This is either to facilitate comparisons of trend values 
among neighbouring years or to convert a trend equation from an annual 
to a monthly basis.’ Shifting the origin is a very simple matter. For ex- 
ample, consider the trend equation for the data of illustration 11. The 
trend equation 

Y—110—2X 

(Origin 1966 ; time unit 1 year) 

If we wish to shift the trend equation to 1960, we note that this year 
precedes the stated origin of 1966 by 6 time units. Consequently we 
must deduct 6 times annual increment that is b (—6), from the trend 
value of 1966 as below 

Y=110—2(—6)=110+12=122 
The value 122 becomes the trend value at the new origin 1960 and the 
trend equation may now be written as 

Y-122—2X 4 

In changing the origin only ‘a’ value changes, the ‘b’ value remains 

the same, since the slope of a straight line does not undergo change. 
Shifting the origin, therefore, amounts to establishing a new ‘a’ value. 
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The procedure applied for changing the origin in case of straight line 
trend can be expanded to cover parabolic trend equations. For example, 
if the second-degree trend equation is 

Y—40-F-2x4-0'43? 
(origin 1974) 
and we wish to shift the origin for this equation to 1970, the calculations 
would be as follows : 
5 Y—40--2 (x—4)--04 (x—4) —40--2x—8--0:4 (х2—8х--16) 
Y—40--2x—8-1-4x! —3:2x—6:4 —38:4— 2x 4-4x* 
(origin 1970) 
П, MEASUREMENT OF SEASONAL VARIATIONS 


Most of the phenomena in economics and business show seasonal 
patterns, When data are expressed annually there is no seasona] variation. 
However, monthly or quarterly data frequently exhibit strong seasonal 
movements and considerable interest attaches to devising a pattern of 
average seasonal variation. For example, if we observe the sales of a 
bookseller we find that for the quarter July-September (when most of 
the students purchase books), sales are maximum. If we know by how 
much the sales of this quarter are usually above or below the previous 
quarter for seasonal reasons, we shall be able to answer a very basic 
question, namely, was this due to an underlying upward tendency or 
simply because this quarter is usually seasonally higher than the previous 
quarter. 

In order to analyse seasonal variation, it is necessary to assume 
that the seasonal pattern is superimposed on a series of values and is 
independent of these in the sense that the same pattern is superimposed 
irrespective of the level of the series, i.e., the June quarter always contri- 
butes so much more or so much less to the series. 

Before attempting to measure seasonal variation certain preliminary 
decisions must be made. For example, it is necessary to decide whether 
weekly, quarterly or monthly indexes are required. This will be decided 
in the light of the nature of the problem and the type of data available. 

To obtain a statistical description of a pattern of seasonal variation 
it will be desirable to first free the data from the effects of trend, cycles 
and irregular variation. Once these other components have been elimi- 
nated, .we can calculate, in index form, a measure of seasonal variations 
which is usually referred to as a seasonal index. Thus the measures of 
seasonal variation are called seasonal indexes (per cent). 

For monthly data, a seasonal index consists of 12 numbers, one 
for each month of a year, or à number of years, that has taken place typi- 
cally in each month. Thus a second index may be specific or typical. 
specific seasonal index refers to the seasonal changes during a particular 
year. А typical seasonal index is obtained by averaging a number of 
specific seasonals. It is thus a generalized expression of seasonal variations 
for a series, Seasonal indexes are given as percentages of their average, 
ie., each month is represented by a figure expressing it as a percentage 
of the average month. For example, if a seasonal index for January is 
75, this means that for the month of January, sales, orders, purchases or 
hc d our data happen to Бе, аге 75 per cent of those of the average 

month. 
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There are many techniques available for computing an index of 
seasonal variation, many of the simpler methods were devised prior to 
the development of electronic computers and were designed to sacrifice 
precision for ease of computation. Any acceptable modern method for 
computing such an index probably will be programmed for a computer 

—Ssolution. The method should be designed to meet the following criteria : 


(1) It should measure only the seasonal forces in the data: It 
should not be influenced by the forces of trend or cycle that may be presént. 

(2) It should modify the erratic fluctuations in the data with an 
acceptable system of averaging. 

(3) It should recognize slowly changing seasonal patterns that may 
be present and modify the index to keep up with these changes. 

The following are some of the methods more popularly used for 
measuring seasonal variations : 

l. Method of Simple Averages (Weekly, Monthly or Quarterly). 

2. Ratio-to-Trend Method. 

3. Ratio-to-Moving Average Method. 

4. Link Relative Method. 


1. Method of Simple Averages 


This is the simplest method of obtaining a seasonal index. The 
following steps are necessary for calculating the index : 

(i) Arrange the unadjusted data by years and months (or quarters 
if quarterly data are given). Б 

(ii) Find the totals of January, February, etc. 

(iii) Divide each total by the number of years for which data are 
given. For example, if we are given monthly data for five years then we 
shall first obtain total for each month for five years and divide each total 
by 5 to obtain an average. 

(iv) Obtain an average of monthly averages by dividing the total of 
monthly averages by 12. 

(v) Taking the average of monthly averages as 100, compute the 
percentages of various monthly averages as follows : 
Monthly average for January 
Average of monthly averages 


If, instead of the average of each month, the totals of each month are 
obtained, we will get the same result. 


The following example shall illustrate the method : 


Illustration 17. Consumption of monthly electric power in millions of kw. 
hours for street lighting in a big city during 1971-75 is given below : 
Years Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec, 
1971 348 281 278 250 231 216 223 245 269 302 225 347 
1972 342 309 299 268 249 236 242 262 288 321 342 364 
1973 367 328 320 287 269 251 259 284 309 245 367 39 
1974 392 349 342 311 290 * 273 282 305 328 364 379 417 
1975 420 378 370 334 314 296 305 330 356 396 422 45] 


Find out seasonal variation by the method of monthly averages. F 
i (B.A. Hons., Econ., Delhi) 


Seasonal Index for January= x100. 
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olution : CONSUMPTION OF SEASONAL INDICES BY 
еч THE METHOD OF MONTHLY AVERAGES 
Monthly Five- Seasonal 
Consumption of monthly electric power totals for yearly variation 
a d: j Syears average index 


Months 1971 1972 1973 1974 1975 


a) (2) (3) (4) (3) (6) (7) (8) (9) 
Jan. 318 342 367 392 420 1,839 3678 1161 
Feb. 281 309 328 349 378 1,645 3290 103'9 
Маг. 278 299 320 342 370 1,609 321°8 101°6 
Apr. 250 268 287 311 334 1,450 2900 9r4 
May 231 249 269 290 314 1,353 270°6 855 
June 216 236 251 273 296 1,272 2544 803 
July 223 242 259 282 305 1311 2622 828 
Aug. 245 262 284 305 330 1,426 2852 900 
Sept. 269 288 309 328 356 1,550 31070 98'0 
Oct. 302 321 345 364 396 1,728 345'6 1091 
Nov. 325 342 367 389 422 1,845 369'0 116°6 
Dec, 347 364 394 417 452 1,974 3948 1247 

Total 19002 3,8004 1,200 
Average 1,5835 3167 100 


The above calculations are explained below: 

1. Column No. 7 gives the total for each month for five years. 

2, Incolumn No. 8 each total of column No. 7 has been divided by 5 to obtain 
ап average for each month. 

3. The average of monthly averages is obtained by dividing the total of monthly 
averages by 12. 

4. Tn column No. 9, each monthly average has been expressed as a percentage 
of the average of monthly averages. Thus, the percentage for January 

3678 211 
73167 x100—1161 
3290 4 
Percentage for February= 3167 <100=103'9. 

If, instead of monthly data, we are given weekly or quarterly data, 
we shall compute weekly or quarterly averages by following the same 
procedure as explained above. 

, Illustration 18. Assuming that trend is absent, determine if there is any season- 
ality in the data given below : 


Year Ist Quarter 2nd Quarter 3rd Quarter 4th Quarter 
^ 41 33 35 


1972 37 
1973 37 39 : 36 36 
1974 40 41 33 31 
1975 33 44 40 40 
What are the seasonal indices for various quarters ? 

Solution : COMPUTATION OF SEASONAL INDICES 
Year Ist Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1972 37 41 aS: 35 
1973 37 3'9 36 36 
1974 40 41 33 31 
1975 33 44 >t uO. 40 
Total 147 165 joco viu 
Average 3675 4125 3°55 355 
Seasonal 


Index 9866 110.74 9530 — 95°30 
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Notes for calculating Seasonal Index 
3°675+-4°125+3'55+3'55 14,900 3725 
4 ЕР: 


The average of averages 


Quarterly average „гуу 


UD ere, аР. erai 
Seasonal Index сагала: 


675 
Seasonal Index for first quarter es 100-987 


4125 
Seasonal Index for second quarter— 3-755 


x100=110°7 


55 
Seasonal Index for the third and fourth quarters= 4238 *100=95'3. 


Merits and Limitations of the Method of Monthly Averages 


This method is the simplest of all methods of measuring seasonality. 


However, it is not a very good method. It assumes that there is no trend 
component in the series, i.e., 0—CSI. 
assumption. Most economic series have treni 


index computed by this me de? 
seasonals. Furthermore, the effects of cycles on the original values may 


or may not be eliminated by the averaging process. This depends on the 
duration of the cycle and the term of the average, that is, on the number 


of months included in the average. 
value. In its simplest form the method only serves the purpose where no 


definite trend exists. 
Ratio-to-Trend Method 


But this is not a justified 
ds and, therefore, the seasonal 
thod is actually an index of trends and 


Thus, this method is seldom of any 


This method of calculating а seasonal index (also known as the 


""*percentage-to-trend method) is relatively simple and yet an improvement 
over the method of simple averages explained in the preceding section. 
This method assumes that seasonal variation for a given month is constant 
fraction of trend. The ratio-to-trend method presumably isolates the 
seasonal factor in the following manner. Trend is eliminated when the 


ratios are computed. In effect 


TXSXCXI osx ex 


Random elements are supposed to disappear when the ratios are averaged. 
A careful selection of the period of years used in the computation is 
expected to cause the influences of prosperity or depression to offset each 
other and thus remove the cycle. For series that are not subject to 
onounced cyclical or random influences and for which trend can be 
computed accurately, this method may suffice. The steps in the 
computation of seasonal index by this method are : 


1, Trend values are obtained by applying the method of least 


squares. 


2. The next step is to divide the original data month by month by 


the corresponding trend values and multiply these ratios by 100. The 
values so obtained are now free from trend and the problem that remains 
is to free them also of irregular and cyclical movements. 


3. In order to free the values from irregular and cyclical move- 


ments the figures given for the various years for January, February, etc., 
аге averaged with any one of the usual measures of central value, for 
instance, the median or the mean. If the data are examined month by 
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month, it is sometimes possible to ascribe a definite cause to usually high 
orlow values. When such causes are found to be associated with irregular 
variations (extremely bad weather, an earthquake, famine and the like) 
they may be cast out and the mean of the remaining items is referred to 
as a modified mean, Since such scrutiny of the data requires considerable 
knowledge of prevailing conditions and is to a large extent subjective, it 
is often desirable to use the median which is generally not affected by 
very high or very low values. 

4. The seasonal index for each month is expressed as a percentage of 
the average month. The sum of 12 values must equal 1,200 or 100%. If 
it is not, an adjustment о made by multiplying each index by а suitable 

12 
factor ( the sum of the 12 values ^ 


_ lilustration 19, Find seasonal variations by the ratio-to-trend method from the 
data given below : 


- This gives the final seasonal index. 


Year Ist Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1973 30 40 36 34 
1974 34 52 50 44 
1975 40 58 54 48 
1976 54 76 68 62 
1977 80 92 86 


82 
е Solution, For determining seasonal variation by ratio-to-trend method, first we 
will determine the trend for yearly data and then convert it to quarterly data. 
CALCULATING TREND BY METHOD OF LEAST SQUARES 


~ Year Yearly Yearly Deviations 
totals average from mid- 
year 
T x ХР: x Trend values 
1973 140 35 -2 —70 4 32 
1974 180 45 =I —45 1 44 
1975 200 50 0 0 0 56 
1976 260 65 1 65 1 68 
1977 240 85 2 170 4 80 
ZY—280 ZXY-120 XxX-10 


The equation of the straight line trend is Y—a-4-bX. 
OC CRM: ИРА 
N osia 7 а. | es 


Quarterly increment- 12 — 


TREND VALUES 


Year Ist Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1973 2T5 305 335 `5 
1974 39'5 425 455 403 
1975 SI'S $45 5T5 605 
1976 635 66:5 695 725 
1977 755 785 81°5 845 


The given values are expressed as percentage of ihe corresponding 
trend values. 
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Thus for Ist Qtr of 1973, the percentage shall be (30/27:5) x 100—109*09, for 2nd Otr 
(40/30:5) x 100— 131715 etc. 
GIVEN QUARTERLY VALUES AS % OF TREND VALUES 


Year Ist Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1973 109:09 131715 10746 . 9315 
1974 86:08 122735 109°89 90°72 
1975 7767 106°42 93°91 79°34 
1976 85:04 11429 97°84 85:52 
1977 10696 11720 10552 97°04 
Total 463'84 591'41 514'62 445 T 
Average 92°77 11828 102792 8915 
5.1. Adjusted 92:05 T1736 - 102712 8447 


Total of averages —92:71--118:28--102:924-89:15 —403:12. 
Since the totalis more than 400 an adjustment is made by multiplying each 


average by 509 and final indices are obtained. 


Merits and Limitations of the Ratio-to-Trend Method 

Merits. (1) Compared with the method of monthly averages this 
method is certainly a more logical procedure for measuring seasona 
variations. It hasan advantage over the moving average procedure 
too, for it has a ratio-to-trend value foreach month for which data 
are available. Thus there is no loss of data as occurs in the case of 
moving averages. This is a distinct advantage especially, when the period 
covered by the time series is very short. 

(2) It is simple to compute and easy to understand. 

Limitations. The main defect of the ratio-to-trend method is that if 
there are pronounced cyclical swings in the series, the trend—whether a 
straight line or a curve—can never follow the actual data as closely as а. 
12-month moving average does. In consequence a seasonal index com- 
puted by the ratio to moving average method may be less biased than the 
one calculated by the ratio-to-trend method. 

Ratio-to-Moving Average Method* 

Theratio-to-moving average also known as the percentages of 
moving average method is the most widely used method of measuring 
seasonal variations. The steps necessary for determining seasonal pattern. 
by this method are : 

1. Eliminate seasonality from the data by ironing it out of the 
original data. Since seasonal variations recur every year—that is, since 
the fluctuations have a time span of 12 months—a centered 12-month 
moving average tends to eliminate these fluctuations. (In case of quarterly 
data, а centered 4-quarter moving average must be used). The centered 12- 
month moving average which aims to eliminate seasonal and irregular 
fluctuations (S and J) represents the remaining elements of the original 
data, namely, trend and cycles. Thus, the centered 12-month moving 
average approximates T.C. 

2. Express the original data for each month as a percentage of the 
centered 12-month moving average corresponding to it. 


* The computation by this method is identical with computation of the ratio-to- 
tr ad seasonal index just described, except that a moving average trend is substituted for 
th« least square trend used in the previous calculation, 
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3. Divide each monthly item of the original data by the correspond- 
ing 12-month moving average, and list the-quotients as ‘Percent of Moving 
Average. We have now succeeded in eliminating from the original data 
foa considerable extent the disturbing influences of trend and cycles. It 
remains to rid the data of irregular variations. By averaging these 
percentages or a given month (Step 4) the irregular factors tend to 
cancel out and the average itself reflects the seasonal influence alone. 


4. The purpose of this Step is to average, and—in the process of 
averaging—to eliminate the irregular factor. We assume that the relatively 
high or extremely low values of seasonal relatives for any month are 
caused by irregular factors. The elimination of extremes may be achieved 
while we are averaging all Januarys, Februarys and the like. We do this 
by using an appropriate type of average. The median is appropriate since 
it is not affected by extremes, Thus, by using the median as an average we 
can obtain the typical seasonal relative for each month which will not be 
affected by irregular factors. 


mean of the remaining seasonal relatives is taken. A separate table is 
prepared in which the calculations involved in this Step are shown. These 


7 5. If the total is not equal to 1,200 or 100 per cent, an adjustment 
is made to eliminate the discrepancy. The adjustment consists of multi- 
plying average of each month obtained in step 4 by 
1,200 
the total of the modified mean for 12 months ` 


This adjustment is made not only to achieve accuracy, but also 
because when we come to eliminate seasonality from the original data we 
do not wish to raise or lower the level of the data unduly. Thus, ifa 
Seasonal index aggregates more than 1,200 (or averages more than 
100) then the original data adjusted in terms of it will total less than the 
unadjusted original data. If it totals less than 1,200, the opposite would 


The logical reasoning behind this method follows from the fact that 
12-month Moving average can be considered to represent the influence of 


TXSxCXxI 
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: Thus the ratio to the moving average from which this method gets 
its name, represents irregular and seasonal influences. If the ratios for 
each worked over a period of years are then averaged most random 
influences will usually be eliminated. Hence, in effect 

SxI 

E 


bon ————————— ———— 
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Шиѕігабев 20, Calculate seasonal indices by the ‘ratio to moving average 
method’ from the following data : " 


Year Ist Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1971 68 62 61 63 
1972 65 58 66 61 
1973 68 63 * 67 
(В; Com., Delhi, 1975) 
Solution, CALCULATION OF SEASONAL INDICES BY ‘RATIO TO 
MOVING AVERAGE' METHOD 
Year Quarter Given 4-figure 2-figure 4-figure Given figure 
figures moving moving moving as 9 of mov- 
totals totals average ing average 
1971 I 68 
п 62 
— 254 
ш в6——————————>505 63:125 96'63 
—— ——35251 
IV —————-24498 627250 10120. 
————241 
1972 I 6 99 62375 10421 
—— —252 
п 58—————————5502 627750 92°43. 
————>250 
IH 6 503 62:875 10497 
—— — 3253 
iV 61—— — — — —511 63:875 95:50 
— 258 
1973 1 68—513 64'125 10604 
——— —25255 
Hu 6 516 64500 97:67 
———— 2261 
ш 63 
IV 67 
CALCULATION OF SEASONAL INDEX 
Percentage to Moving Average 
Year Ist Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1971 = se 96:63 10120 
1972 10421 92:43 10497 95°50 
1973 106°04 97°67 SS = 
Total 21025 190710 201`60 19670 
Average 105:125 95°05 100'80 98°35 
Seasonal Index 10530 9521 100°97 9852 


39932 


Arithmetic average of averages= = 4 — —99'83 


Byexpressing each quarterly average as percentage of 99°83 we will obtain. 
seasonal indices. 


Seasonal index of Ist Quarter= MS x100—105'30 


99°83 
0505 саб 
= a Х100—9521 
_ 10080 
„жнын 
98:35 
9983 


» 2nd „ 


x100—100:97 


»^4h „ = х100= 98°52 
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Illustration 21, ADDS ratio to moving average method to ascertain seasonal 
indices from the following.data : + 4 
Year and months Sales Year and months Sales 
(n thousand о Cin thousand 
units) units) 
1972 » 1974 
Jan. 10 Jan. 10 
Feb. 12 Feb. 12 
March 13 March 11 
April 15 April 12 
May 16 May 13 
June 16 June 15 
July 17 July 15 
Aug. 18 Aug. 17 
Sept. 18 Sept. 18 
Oct. 19 Oct. 20 
Nov. 22 Nov. 22 
Dec. 22 Dec. 24 
1973 1975 
Jan. 1 Jan. 12 
Feb. 11 Feb. 13 
March 12 March 13 
April 13 April 15 
May 14 May 16 
June 14 June 18 
July 15 July 20 
Aug. 15 Aug. 20 
Sept. 15 Sept. 21 
Oct. 16 Oct. 22 
Nov. 18 Nov. 24 
Dec. 20 Dec. 25 
Solution, COMPUTATION OF 12-MONTH MOVING AVERAGES 
Percentage of cen- 
Sales 12- 12- 2-month Centered tered I2-month 
Months (Thousand month month moving 12-month moving 
units) moving moving total of moving average average 
total average col. 4. (col. 5+2) ^ (col. 2+col, 6) x 100 
1 2 Е 4 5 6 7 
1972 
Jan, 10 
Feb. 12 
March 13 
April 15 
May 16 
June 16 
198 1650 
шу 17 33°08 1654 10.8 
AS 18 199 1658 
2 33°08 15°54 x 
198 16:50 pads 
Sept. 18 a d 32:92 1646 1094 
Oct. 19 i im 32°67 16°33 1163 
Nov. 22 32°33 16 E 
v ist 193 16°08 16 aed 
. 32°00 16: i 
191 1592 6:00 1375 
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А Percentage of сеп- 
Sales 12- 12- 2-month Centered tered 12-month 


Months (Thousand month month ~ moving 12-month moving 
units) moving moving total of moying average average 
total average æ col. 4. (col.5+2) ^ (col. 2=col. 6) x 100 
1 2 3 4 5 6 y, 
1973 E 
Jan. 1 31°67 15°83 69'5 
189 15:75 
Feb. il 31°25 15°62 704 ^ 
186 15°50 7 
March 12 30°75 15°37 78'1 
183 15°25 
April 13 30°25 15°12 860 
180 15°00 
May 14 29°67 14°83 944 
176 14'67 
June 14 29°17 14°59 95'9 
174 14°50 
July 15 28°92 14°46 103°7 
173 14°42 
Aug. 15 28°92 14°46 103°7 
174 1450 
Sept. 15 28°92 1446 1037 
173 1442 
Oct. 16 28°75 14°37 107°6 
172 14°33 
Noy. 18 2758 13°79 126°0 
171 14°25 
Dec. 20 2858 1429 140'0 
1974 
172 1433 
Jan. 10 2866 1433 70:0 
172 14°33 
Feb. 12 28°83 14°41 83°3 
174 14°50 ‘ 
March 11 29°25 14°62 752 
177 1475 
Apri 12 2983 14°91 805 
181 15°08 
May 13 30'50 15°25 852 
185 1542 
June 15 3117 15°58 90°6 
189 15°75 
July 15 31°67 15°83 95°4 
191 15°92 
Aug. 17 3r92 15:96 1065 
192 16°00 
Sept. 18 3217 1608 1119 
194 1617 р 
Oct. 20 32°59 16°29 12278 
197 1642 
Nov. 22 33°09 1654 1330 
200 16°67 
Dec. 24 3359 1679 1429 
203 1692 
1975 
Jan. 12 3425 1712 701 
208 1733 
Feb. 13 34°91 17°45 745 


211 1758 
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1 2 3 4 x 6 7 
March 13 79 3541 1770 T4 _ 
214 1783 
April 15 3583^ 1791 837 
216 18:00 E е \ 
Ма; 16 3617 18°08 88°5 
x 218 18°17 
June 18 36°42 18°21 98°8 
219 18°25 
July 20 
Aug, 20 
Sept. 21 
Oc. 22 
Nov, 24 
Dec. 25 
COMPUTATION OF SEASONAL INDICES 
1972 1973 1974 1975 Median Seasonal 
Dib ies TRUM ee г Index 
Jan. . 69:5 70:0 701 70:0 70'28 
Feb. 704 833 745 745 74°80 
March 781 752 734 752 75:50 
April 860 80:5 837 837 8403 
May 944 852 88°5 88°5 8885 
June” 95°9 90°6 988 959 96:28 
July 1028 1037 954 1028 10320 
‘Aug, 1088 1037 1065 1065 106:92 
Sept. 1094 — 1037 111°9 1094 10984 
Oct. 1163 1076 1278 1163 11676 
Nov. 136] 1260 1330 1330 133753 
Dec, 1375 1400 1429 — 1400 14056 
14958. 1,200755* 


It should be noted that there are only three values for each month 
since the moving average failed to provide averages for the first half of 
1972 and the last half of 1975. Median has been Used to average the 
figures given for the individual months, The sum of 12 values obtained 
is 1,1958. It is necessary, therefore, to make an adjustment so that 
the total is 1,200. The adjustment is done by multiplying the average 


$ 1200 
(median) values bY ТТЕ = 1'004. The final result thus obtained 


gives us the seasonal indices, The interpretation of this index is very 
simple. Typical April sales are 84:03 per cent of those of the average 
month, typical November sales are 133°53 per cent of those of the average 
month, and so on, 
Merits and Limitations of the Ratio-to-Moving Average Method 

Merits. This method of measuring seasonal variation is considered 
to be most satisfactory and as such is more widely used in practice than 
other methods. The index obtained by the ratio-to-moving average 
method ordinarily does not fluctuate so much as the index based on 
straight-line trends. Mathematical methods of avoiding the effects of 
the business cycle are not usually needed, for the 12-month moving average 
follows the cyclical course of the actual data quite closely. Therefore, the 
index ratios are often more representative of the data from which they 


* The difference is due to approximation. 
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are obtained than in the case in the ratio-to-trend method. Also ratio- 
to-moving average method allows for greater flexibility. 

Limitations. However, one drawback ofthis method is that sea- 
sonal indices cannot be obtained for each month for which data are avail- 
able. When a 12-month moving average is taken, six months in the be- 
ginning and six months in the end are left out for which we cannot 
calculate seasonal indices. 


4. Link Relative Method 

Amongst all the methods of measuring seasonal variation, link 
relative method is the most difficult one. When this method is adopted 
the following steps are taken to calculate the seasonal variation indices : 

1. Calculate the link relatives of the seasonal figures. Link rela- 
tives are calculated by dividing the figure of each season* by the figure of 
the immediately preceding season and multiplying it by 100. 


Current season’s figure 
> x 100, 
Previous season’s figure 

These percentages are called link relatives since they link each 
month (or quarter or other time period) to the preceding one. 

2. Calculate the average of the link relatives for each season. 
While calculating average we might take arithmetic average but the me- 
dian is probably better. The arithmetic average would give undue weight 
to extreme cases which were not due primarily to seasonal influences. - 

3. Convert these averages into chain relatives оп the base of the 
first season. 

4. Calculate the chain relatives of the first season on the base of 
thelastseason. There will be some difference between the chain relative 
of the first season and the chain relative calculated by the previous method. 
This difference will be due to the effect of long-term changes. It is, 
therefore, necessary to correct these chain relatives. 

5. For correction, the chain relative of the first season calculated 
by first method is deducted from the chain relative (of the first season) 
calculated by the second method. The difference is divided by the num- 
ber of seasons. The resulting figure multiplied by 1, 2, 3 (and so on) is 
deducted respectively from the chain relatives of the 2nd, 3rd, 4th (and so 
On) seasons. These are correct chain relatives. 

6. Express the corrected chain relatives as percentages of their 
averages. These provide the required seasonal indices by the method of 
link relatives. 

The following example will illustrate the process : 


Mustration 22. Apply method of link relatives to the following data and 
calculate seasonal indices. 


QUARTERLY FIGURES 
Quarter 1971 1972 4973 1974 1975 
1 60 54 68 T2 66 
H 65 79 65 58 T3 
TH 78 84 93 TS 80 
Iv 87 T3 64 8'5 71 


* The word season refers to time period. In case of monthly data season 
would refer to a month and in case of quarterly data to a quarter, 


SM-E—9'77-33 
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Solution: CALCULATION OF SEASONAL INDICES BY METHOD OF 
LINK RELATIVES 
Quarters 
) 
Year | 1 2 3 4 
1971 — 1083 12070 11175 
192 | 621 1463 1063 86:9 
1973 | 932 95:6 1431 688 
1974 1125 80°6 1293 1133 
1975 176 110°6 109°6 888 
Arithmetic | — 3454 —8635 5414 _ 08:28) -6083 E 4693 o6 
average 4 
Ec | a D yv i 
Chain | 100x 108728 121'66x10828 | — 93'86X 13173 
relatives | 100 ~~ 100 100 100 
| ig =108'28 =131773 —123:64 
Corrected | a х i д 
in rela- | 108:28—1:675 13173—335 123:64—5:025 
a S 100 —106:605 —128:38 =118°615 
= ДРЕА ee z — 
| 
Seasonal 100x 100 106:605 128738 118615 199 
indices | i154 pa 5100. rag 100) opa 20 
| —8818 —94'01 =11321 =104'60 


For the above table the calculations are explained below : 

Chain relative of the first quarter (on the basis of first quarter) —100 

Chain relative of the first quarter (on the basis of the last quarter) 
8635 : = оза 

The difference between these chain relatives=106'7—100=6'7. 


Difference per quarter 7.1675. 

Adjusted chain relatives are obtained by subtracting 1X 1'675, 2X 1675, 
3 Х1:675 from the chain relatives of the 2nd, 3rd and 4th quarters respectively. 

Seasonal variation indices have been calculated as follows : 
Average of corrected chain relatives 
— 100+106°605+128'38+118°615 453°6 
4 ipi; 
Correct chain relatives x 100 
1134 

Which Method to Use ? Four different methods of measuring sea- 
sonal variations have been discussed above. The question now arises 
which method to adopt іп a particular case. The choice will very much 
depend upon the nature of data and the object of investigation. Amongst 
ali the methods, method of monthly averages is the simplest. But it is 
a crude method as it assumes that there is no trend component in time 
series. This method can be used only if a seasonal rhythm dominates the 
data, and trend and cycle are negligible. The method of link relatives 


=113'4 


Seasonal variation index= 


» 
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was widely used at опе time, but its disadvantages seem to outweigh its 
advantages and it has currently fallen in some disfavour*. On weighing 
the merits and demerits of ratio-to-trend method and ratio-to-moving 
average method one finds that ratio-to-moving average method has seve- 
ral advantages over the ratio-to-trend method. Hence, in general it may 
besaid that because of theoretical and practical advantages, ratio-to- 
moving average method should be preferred to other methods. 


Selecting the Period to Compute Seasonal Indices 


In order to simplify the examples a period of only 4-5 years was 
employed to compute the seasonal indexes. In actual fact it is suggested 
that many more years be included. It is because of the fact that a sea- 
sonal index based on a short period is often unduly affected by conditions 
prevailing during one phase of the business cycle or by powerful random 
influences. The period should encompass at least one and, if possible, 
several business cycles. The long span of years offers greater likelihood 
that irregular and cyclical forces will cancel out or at least have their in- 
fluence minimized. Ten years is often viewed as a practical minimum. 
In selecting the period care should be taken to have the period begin and 
end at the same phase of the business cycle in order to avoid distortions 
that could result if more years of prosperity than of depression were 
included. 


Average in Computing Seasonals 


In each of the methods described for computing seasonal varia- 
tions, the individual monthly averages were averaged in order to 
eliminate random influences and any remaining cyclical elements. In two 
of these examples the average selected was the arithmetic mean and in 
another the median. This poses the question of the relative merits of 
these or other averages for the purpose at hand. Because the mean is 
affected by every item in the series, it should be used when the number 
of years is large. However, when the period is shorter the use of the 
mean is not recommended because extreme items, occasioned by the very 
random or cyclical factors that the calculation is designed to eliminate, 
distort its value. The median, on the other hand, is a positional average. 
As such it is not affected in any way by extreme values, but it may be 
unduly influenced by the inclusion or exclusion of a year or two in the 
calculation. A positional mean as suggested by Wessel and Willett avoids 
the disadvantages of both the mean and the median. It is computed by 
taking the arithmetic mean of the central items in the series. Suppose the 
following are the arranged ratios to the moving average for May. 

80 85 90 / 95 96 98 100 102 105 / 107 112 120 


In this case the arithmetic mean of the middle six items would be 
employed as the seasonal index. It is obvious that extreme items cannot 
influence this value and that detail of position alone is not significant. 


Uses of Seasonal Index and its Limitations 


Uses. A seasonal index may be used either analytically or syntheti- 
cally. Analytically а seasonal index is employed to adjust original data 
in order to yield deseasonalized data that permit the study of short-run 
fluctuations of a series not associated with seasonal variations. The 


* Freund and Williams : Modern Business Statistics, p. 445. 
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procedure of adjusting data for seasonal variations is a simple one. It 
involves merely the division of each of the original observations by 
the appropriate seasonal index for that month, i.e., 
TSCI 
TCI— S^ 

Synthetically a seasonal index may be used for economic forecasting 
and managerial control. Management usually benefits from examining the 
seasonal patterns of its own business, patterns that directly influence its 
employment, production, purchase, sales and inventory policies. For 
example, if a firm expects to sell Rs. 36,000,000 worth of goods during 
the forthcoming year, average monthly sales of Rs. 3,000,000 are 
anticipated. If, however, the volume of sales is subject to seasonal 
fluctuation, the actual monthly values will deviate significantly from this 
average. Should the seasonal index for May be 120, the firm can expect 
sales of Rs. 3,600,000 during that month; in comparison, an index 
of 90 for December would lead them to anticipate sales of only Rs. 
2,700,000. Possible solutions for seasonality available to individual firms 
are numerous. By special price and advertising policies a producer con- 
fronted with a strong seasonal demand for his product may try to stabilise 
sales by encouraging off-season consumption. 

The most promising solution for seasonality is diversification. И 
benefits not only the firm but also society at large. Whenever diversification 
is possible, real costs of seasonal variations can be reduced or even 
eliminated. Diversification involves the development of production lines 
having complementary seasonal movements. While some expand seasonally 
others contract. Consequently labour and facilities can be transferred from 
one line to another as seasonal changes take place. However, diversification 
is possible only in those lines of production that: have approximately the 
same labour and equipment requirements. 


Limitations. While making use of seasonal indexes in business and 
economic problems, the following precautions and limitations in their 
application should be kept in mind : 

l. No technique can measure seasonal variation precisely. The 
various methods of measuring seasonal variations are based on rather 
unrealistic assumption that the seasonals are changing in some regular and 
systematic pattern. 


2. In developing seasonal index we obtain a series of measures—a 
measure for January, a measure for February, and so forth—each of 
which generally differs from 100. However, we must remember that these 
measures are only rough estimates. Hence, if we obtain a seasonal index 
in which the values are all close to 100—for example, if the index values 
for the consecutive months are 102, 99, 103, 98, etc., it may well be that 
no real monthly seasonal variation exists in the series and that the small 
differences from 100 are only due to random influences or imperfect 
measurements. 

3. Evenifthe computed index of seasonal variation indicates а 
pronounced pattern, it may have no significance for a particular year. It 
must be remembered that any seasonal index of the type we have described 
represents an average pattern during a number of years. If the pattern of 
seasonal variation in the series is not a stable one, any average pattern may 
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bea poor representation of the actual seasonal variation taking place 
during a given year. 


Deseasonalized Data 

By deseasonalized data we mean the data which show how things 
would have been or would be if there were no seasonal fluctuations. In 
order to obtain such data we have to remove the effect of seasonal 
variations. The process of deseasonalizing is very simple. If we are 
given monthly data, the only thing that is to be done for deseasonalizing 
is to divide each value by the seasonal index of the appropriate 
month and multiply the quotient by 100. This process is based ona 
very simple logic: if an index for May is, say, 125, this means that 
May sales (or whatever our data happen to be) are 125% of those of 
the average month ог, in other words, 12:5 per cent of what they would 
have been if there һай been no seasonal variation. Hence, to see what 
May sales would have been if there had been no seasonal variation, we 
multiply the observed sales by 100/125 or divide them by 1°25 which is the 
same. Hence, to obtain deseasonalized data, divide the actual data 
by the appropriate seasonal indices. Thus the data would be free from 
the seasonal impact. 

ПІ. MEASUREMENT OF CYCLICAL VARIATION 

Business cycles are perhaps the most important type of fluctuation 
in economic data. Certainly they have received a lot of attention in 
economic literature. Despite the importance of business cycles, they 
are the most difficult type of economic fluctuations to measure. This 
is because successive cycles vary so widely in timing, amplitude and 
pattern, and because the cyclical rhythm is inextricably mixed up with 
irregular factors. Because of these reasons it is impossible to construct 
meaningful typical cycle indexes or curves similar to those that have been 
developed for trends and seasonals. The various methods* used for 
measuring cyclical variations are : 

l. Residual method. 

2. Reference cycle analysis method. 

3. Direct method. 

4. Harmonic analysis method. 
B Only the first two methods which are in popular use are discussed 

еге. 


Residual Method 

Amongst all the methods of arriving at estimates of the cyclical 
movements of time series, the residual method is most commonly used. 
This method consists of eliminating seasonal variation and trend, thus 
obtaining the cyclical irregular movements. Symbolically : 

IxSXCXI .TxCxI md XE exl 

Next, the data are usually smoothed in order to obtain the cyclical 
movements, which are sometimes termed the cyclical relatives, since 
they are always percentages. It is because the cyclical—irregular or the 


* For details refer to Croxton and Cowden : Applied General Statistics, Ch. 16. 


cyclical, movements remain as residuals that this procedure is referred to 
as the residual method. 


Limitations of the Residual Method. If the trend ordinates 
perfectly depicted the pattern of secular change and if the seasonal 
index exactly reflected seasonal influences, the residual method 
would leave values reflecting only cyclical and irregular irflvences. 
Because such perfection is rarely encountered, the computed values almost 
always contain some trend and seasonal elements. This condition will be 
more or less serious depending on how well or poorly the trend line and 
the seasonal index represent secular and seasonal forces. If a straight line 
trend is employed to describe an essentially curvilinear secular. movement, 
figures presumably adjusted for trend will be grossly distorted. The 
distortion would also occur if the seasonal index were not descriptive of the 
seasonal pattern at the time in question. Thus the residual method is 
based on the assumption that trend and seasonal can be accurately | 
measured and, therefore, be removed at least in large part*. | 


Reference Cycle Analysis or the National Bureau Method 


.. The National Bureau of Economic Research has developed a 
different method of analysing cyclical variations which it has used in the 
study of more than 1,000 specific time series. This method is of value in 
analysing past cycles only. The National Bureau procedure aims to answer | 
two sets of questions : 


S (1) Is there in a given series a pattern of change that repeats 
itself (with more or less variation) in successive cycles in business at large ? 
If so, what are its characteristics ? 


. (2) Is there in a given series a wave movement peculiar їо that — 
series ? If so, what are its characteristics ? 
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.. The questions under (1) are concerned with the behaviour of. indi- 
vidual series during successive waves of expansion and contraction in the 
general economy, those under (2) relate to periodic or semi-periodic | 
fluctuations in individual series. A procedure involving ‘reference dates’ 
has been designed by the National Bureau of Economic Research as à 
device which allows one not only to compare each series with a standard | 
set of à dates and to observe the behaviour of individual series during | 
expansion and contraction of general business but also to compare the 
results for the various individual series. 


The first step is the selection. of the reference dates, which are the 
dates of the peaks and troughs of business cycles. The reference dates 
which cover a duration of over one year and not over ten or twelve years 
were chosen after examination of a large number of economic time series 


and after study of the “contemporary” reports of observers of business 
scene. 


. The next step consists of processing the data of the individual 
series in order to obtain a cyclical pattern for each series for the period 
between each two successive reference troughs. Each period is the same 
for all series, enabling one to compare the results for the various series. 
The processing of each series proceeds as follows : 


* Wessel and Willett : Statistics Applied to Economics and. Business. 
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(1) The data are adjusted for seasonal variation.* 

(2) The seasonally adjusted data are divided into reference cycle 
segments, these segments corresponding to tbe intervals between adjacent 
reference troughs. 

(3) For each segment, the monthly values are expressed as per- 
centages of the average of all the values in the segment. These are 
“reference cycle relatives". As a result of this step, all series, no matter 
what the original unit, are in percentage form. This step eliminates 
inter-cycle trend, since the average of the relatives for each cycle is 100, 
but it does not eliminate intra-cycle trend. The inclusion of intra-cycle 
trend is regarded as desirable, since it “helps to reveal and to explain 
what happens during business cycles". 

(4) Each reference cycle segment is broken into nine stages, to 
correspond to the same nine stages in the business cycle, and the refe- 
rence cycle relatives are averaged for each of nine stages. The nine stages 
are identified as follows : 

(i) The 3 months centered on the initial trough. 

(ii) The first third of the expansion period. 

(iii) The second third of the expansion period. 

(iv) The last third of the expansion period. 

(v) The 3 months centered on the peak. 

(vi) The first third of the contraction period. 
(vii) The second third of the contraction period. 

(viii) The last third of the contraction period. 

(ix) The 3 months centered on the terminal trough. 

The nine-stage averages for each reference cycle segment serve to 
reduce the erratic movements in a series and give a reference cycle 
pattern for the particular series under consideration. _ 

Although the National Bureau method of cycle analysis may seem 
more complicated and cumbersome than the residual technique, it has 
proved to be the simplest and most accurate way of comparing the cycli- 
cal variations of individual series with those of general business. In 
addition it is free of errors that might be introduced were secular trend 
improperly estimated. The latter advantage is indeed significant when 
series whose trend patterns are not clear are under analysis. Its princi- 
pal shortcoming is found in the fact that, because no cycle can be 
studied in this way until it is completed, the method cannot be applied 
to current data. 

IV. MEASUREMENT OF IRREGULAR VARIATION 


The irregular component in a time series represents the residue 
of fluctuations after trend cyclical and seasonal movements have been 
accounted for. Thus, if the original data is divided by T, 5 and C we 
get Li E. =I ) In practice, the cycle itself is so erratic and is 


so interwoven with irregular movements that it is impossible to separate 
them. In the analysis of a time series into its component fluctuations, 
therefore, trend and seasonal movements are usually measured directly, 


* Trend infiuenees are not removed under this method. 
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while cyclical and irregular fluctuations are left together after the other 
elements have been removed. 
MISCELLANEOUS ILLUSTRATIONS 


Illustration 23. Obtain the straight line trend equation and tabulate against each 
year after estimation the trend and short term fluctuations : 


Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 
Value 380 400 650 720 690 620 670 950 1040 


(C. A. 1975) 
Solution. 
CALCULATION OF TREND VALUES 
Year Value t / i 
X XY x Y. (r- Y) 
1960 380 -4 —1520 16 5980 —218'0 
1961 400 s —1200 9 6185 —218°5 
1962 650 —2 —1300 4 6390 +10 
1963 720 —1 — 720 1 659'5 4605 
1964 690 0 0 0 68070 +100 
1965 620 1 620 1 700°5 —80°5 
1966 670 2 1340 4 7210 —51°0 
1967 950 3 2850 9 74175 4-208°5 
1968 1040 4 4160 16 762'0 +2780 
N=9 2Y=6120 5Х=0 EXY=1230 ZX'-—60 Z(Y—Y)-0 
Ү=а+ьХ 
ХҮ _ 6120 _ 
iE m9 aa 
ie BAM 1230! c. 
b= 5 “у 20'5 
Y=680+20'5 X 


When Xis —4, Y will be 
Y—680-4-20'5(—4)— 598 


The other trend values corresponding to each year can be obtai i 
the value of ‘b’ to the preceding value. d SS SUUS o Mondo 


көше ation 24. The following are the annual profits in thousands in a certain 


Year 
197] 1972 1973 1974 1975 1976 1977 
aay eee ajo iran. c 65 80 85 95 
y the 


ons б> Jos method of Least squares fit a straight line, Using that, estimate the 
(B. Com., Madras 1977) 


Solution, 

FITTING STRAIGHT LINE TREND 

Year — Profits 
Ү X XY x 

1971 60 p = 
1972 72 B mo : 
1973 75 —1 —75 1 
1974 65 0 0 0 
1975 80 1 80 1 
1976 85 2 170 4 
1977 95 3 285 9 


maj 
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The equation of the Straight Line trend is Ye=a-+bX. Since 2X0, a. and 


ХҮ 
bo EXE 
XY-532, N=T, EXY=136, 2X2—28 
x a2. —лв; b= 136 4-957 
Ү—176+48571/ 


For 1978, X will be 4. 
Y19:8—76--4:857(4) =76-+ 19:428 —95:428 
Thus the likely profits for the year 1978 is Rs. 95,428. 


Illustration 25. The following data relate to the number of passenger cars in 
(million) sold from 1960 to 1967 : 


Years Number Years Number 
1960 67 1964 56 
1961 53 1965 17:9 
1962 43 1966 58 
1963 61 1967 61 


(a) Fit a straight line trend to the data through 1965 only. 
(b) Use your result in (a) to estimate production in 1967 and compare with the 


actual production. (B.A. Hons. Econ., Delhi, 1974) 
Solation. FITTING STRAIGHT LINE TREND 
Years No. of passenger cars Deviations from 1965 
Y X x? 
1960 67 =s =333 25 
1961 53 —4 —2r2 16 
1962 43 —3 2=1059 9 
1963 61 -2 —122 4 
1964 5°6 -1 = 56 1 
1966 $8 "v 5$ 1 
7 b х 1 
1967 y oem eon = 4 t ү 122 4 
N=8 ZY-A4TS8 ZX-—12 EXY--6T4 =X*=60 
(a) The equation of the straight line trend is Y=a+bx. 
ZY-Na-cbXIX; УХҮ=аЎХ+ЬУХ? 
4T8—8a—12b (0 
—67'4=—12a+60b 00) 
Multiplying Eqn. (i) by 3 and Eqn. (ii) by 2 
143:4—24a—36b 
—1348— —24a--1205 
8'6—84b 
845-86; b—- 36.4102 
Ы 84 


Putting the value of (b) in Eqn. (i) 
478—8a—12x*102 
*8a—1°224=47°8 or 8a=47°8+1°224 
8a=49°024 “ a—6128 
Thus the required equation is Y—61284-102X 


(b) Estimate for 1967 


For 1967, X is 2 
Y—61284-102(2) 26:128--204—6:332 
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Thus the estimate for 1967 is 6'332 million cars. There is some difference in the 
actual sale figure which is 61 million passenger cars and the estimated figure. Some 
difference is bound to be there between the actual and estimated figures because esti- 
mates are based on certain assumptions—it may be a rare chance when actual and 
estimated figures may completely coincide. 


Illustration 26. Fit a straight line trend of the type Y=a+bX by the ‘method of 
least squares to the following time series data. Calculate the trend values and depict 
them graphically. 


Years 1967 1968 1969 1970 1971 1972 1973 1974 1975 
Production 
(thousand tonnes) 11 13 15 14 15 17 18 


16 16 
(B.A. Hons., Econ., Delhi, 1975) 
Solution. CALCULATION OF TREND VALUES BY THE METHOD OF 


LEAST SQUARES HN 
Years Production Deviations from Trend 
(Thousand tonnes) 1961 x? values 
5 x Y. 

1967 11 —4 —44 16 12:068 
1968 13 =з —39 9 12'801 
1969 15 m2 —30 E 13:534 
1970 14 = —14 1 14°267 
1971 15 0 0 0 15'000 
1972 16 1 16 1 15'733 
1973 16 2 32 4 16'466 
1974 17 3 51 9 17199 
1975 18 4 72 16 17:932 
N-9  XY-135 Zx-0 УХҮ= £Y*-60 SY.=135 

Y=a+bx 

Since 2X=0 the values of a and b can be calculated as follows : 
_ ЗУ — 
EET > 
хха 
= "6 707733 
Y=15+0'733X 
when X——4, Y will be 


Y—154-7733(—4) —15—2:932—127068 


{ i The other trend values can be obtained by adding the value of b to the preced- 
ing value. 


The graph of trend values is shown below : 


. GRAPH OF TREND VALUES OBTAINED 
BY METHOD OF LEAST SQUARES 


g 
: 
d 
| 


69 1970 1971 1972 197% 19741975. 
YEARS 
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Ilustration 27. You are given the population figures of India as follows : 
Census year 1911 1921 1931 1941 1951 1961 1971 
Population (crores) 25:0 251 279 319 361 4*9 547 
Estimate the population figures for the year 1981 using an equation of the 
form y=abt, where x —years and y- population. 
(M. Com., Delhi 1977) 


Solution, 
FITTING EQUATION OF THE FORM y=ab® 
Census Year Population EE zx Log Y x x.Log Y 
(crores) 
yY. 

1911 25:0 —3 13979 9 —4.1937 
1921 251 —2 1:3997 4 —271994 
1931 279 —1 14456 1 —1:4456 
1941 319 0 15038 0 
1951 361 1 15575 1 1:5575 
1961 439 2 16425 4 32850 
1971 547 3 177380 9 52140 
N=7 zY=244°6 Ух=0 Хов Y —10'685 2Zx?—28 DIU 


Putting the equation Y=ab* in the Logarithmic form Log Y=Log a+x Log b. 
Since 2x—0, the values of a and b сап be determined as follows : 


zx? 
Log Y —152644-0578x 
For 1981, x will be4. When x=4, Log Y will Бе : 
Log Y—1:52644-:0578(4) —17576 
+ Y=AL 1°7576=57°23 
Thus the estimated population for the year 1981 is 57°23 crores. 
Illustration 28. You are given the following trend equation : 
Y,—204-0:8X : 
Origin, 1972 ; X units, one year ; Y units, production in million tonnes. 
Shift the origin to Jan. 1, 1973. 

р Solution. The origin of 1972 is assumed to be located at the middle of 1972 
that is, July 1, 1972. To change the origin to Jan. 1, 1973, one half of the annual 
increment or b (4), should be added to the trend value for July 1, 1972. Thus we have 

Ye(Jan. 1, 1973) =20+0°8 (3) =20+"4=20°4 
The trend equation now reads : 
Y,—204-04X 
Origin, Jan. 1, 1973 ; X units, one year ; Y units, production in million tonnes. 
Illustration 29. Calculate seasonal indices by the ratio-to-moving average 
method from the following data : 
WHEAT PRICES IN RUPEES PER QUINTAL 


ZLogY 10685 
Li = I zl 
oga N 27 15264 
Log b= zx Loge = bers =0°0578 


T 
S Years | 
tie 1971 1972 1973 1974 
Quarter s | 
COL S 75 86 | 90 100 
@ 60 65 | 72 78 
М 54 63 66 72 
0. 59 80 85 93 
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Solution. CALCULATION OF 4-QUARTERLY CENTERED MOVING AVERAGES 


Percentage of 
centered 
Wheat 4 figure 4 figure 2 figure Centered + quarterly 
Year Prices moving moving moving 4 quarterly moving 
Quarters (Ёз) totals average totals moving average 
average (col. 2~col. 6) 
1 2 3 4 5 6 T 
RR S 
1 
Q: 60 
> 248 62:00 
о, 54 > 126°75 634 852 
> 259 64°75 
о, 59 — 130775 654 902 
264 66:00 
1972 
Qi 86 ———— 13425 671 1282 
> 273 68°25 
Qs 65 ———— 14175 7079 9r7 
oe 294 73°50 
О, 63 ———À 14800 740 851 
> 298 74'50 
Q, 80 —— 150775 754 1061 
——— > 305 7625 
3973 
Qi 90 15325 76'6 175 
—> 308 7700 
Qi т ———À 15525 776 928 
> 313 7825 
о, 66 ——— 15900 795 830 
——— 323 80°75 
Qi 85 ——— 16300 815 1043 
329 8225 
1974 
Qi 100 ———» 166:00 830 1205 
——— 335 83°75 
Qs 78 —ЄЗ 169°50 848 920 
> 343 8575 
Qs 72 
Q, 93 
CALCULATION OF SEASONAL INDEX 
1971 1972 1973 1974 Мейап Seasonal — 
IG n i Index 
st Quarter = 1282 175 12075 1205 11199 
2nd » = 91°7 92'8 92:0 92:0 91:5 
3d 852 851 83 — 852 84'8 
4th $i 902 1061 104:3 == 104'3 1038 
402'0 4000 
1st quarter seasonal index- 1005, 100—1 199 
92 > 
Jud „ » » =s X 100=91°'S etc, 


Tlustration 30. The seasonal indices of the sale of ready-mad 
particular type in a certain store are given below : Кеша garments of * 


Quarter Seasonal Index 
І Jan.-March 98 
П April-June 2 89 
ш July-Sept. 82 


IV Oct.-Dec, 130 
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If the total sales in the first quarter of a year be worth Rs. 10,000, determine how 
much worth of garments of this type should be kept in stock by the store to meet the 


demand in each of the remaining quarters. (M. Com., Delhi, 1974) 
Ў Solution, CALCULATION OF ESTIMATED SALES 
Quarter Seasonal Sales or estimated sales 
Index 

Jan.-March 98 10,000 

April-June 89 19:009589 оов 

July-Sept. 82 19007828674 

Oct-Dec. 130 19:000130 _ 132653 


Mlustration 31. Compute the seasonal averages, seasonal variations and 
- seasonal indices for the following time-series. 


1974 1975 1976 
Jan. 15 23 25 
Feb. 16 22 25 
March 18 28 35 
Apl 18 27 36 
May 23 31 36 
June 23 28 30 
July 20 22 30 
Aug. 28 28 34 
Sept. 29 32 38 
Oct. 33 37 47 
Nov. 33 34 41 
Dec. 38 53 

(B. Com. Bombay Opt., 1976) 

Solution. 


CALCULATION OF SEASONAL INDICES BY THE 
METHOD OF MONTHLY AVERAGES 


Year 

Month 1974 1975 1976 Total Average 

SU у т EG) (4) e (6) (7) 
MÀ NM EE a 

Jan. 15 23 25 63 21 70 
Feb. 16 22 25 63 21 70 
March 18 28 35 81 27. 90 
Apl. 18 27 36 81 27 90 
May 23 31 36 90 30 100 
June 23 28 30 81 27 90 
July 20 22 30 72 24 80 
Aug. 28 28 34 90 30 100 
Sept. 29 32 38 99 33 110 
Oct. 33 37 47 117 39 130 
Nov. 33 34 41 108 36 120 
Dec. 38 44 53 135 45 150 
Total 1080 360 1200 
Average 90 30 100 


1. 5 gives the total for each month for 3 years. In col. 6 each tota] of col. 5 
has tan O iivided by 3 to obtain an average for each month. The average of monthly 
averages is obtained by dividing the total of monthly averages by 12. In col, 7 each 
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monthly average has been expressed as a percentage of the average of monthly averages. 
_ Thus the percentage for January is: - 


21 Ж 
307 100=70 
Simlarly the percentage for Feb. is 
= X X100—70 etc. 
SUGGESTED READINGS 
Chou : Statistical Analysis. 


Croxton & Cowden — : Applied General Statistics. 

Freund and Williams : Modern Business Statistics. 

Neiswanger : Elementary Statistical Methods, 

Wessel & Willett : Statistics as Applied to Economics & Business. 
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Many times, in practical work we come across situations where we 
have to estimate a value which is not available in the given series or predict 
a future value, For example, the census of population in India takes place 
after every 10 years, i.e., we have the census figures for 1921, 1931, 1941, 
1951, 1961 and 1971. Now if we require population figures for 1965 or 
1980, what should we do ? To talk of a census for 1965 or 1980 in 1977 
is impracticable. One way out is to make pure guesswork. But that 
may be highly deceptive. What is desirable is to obtain the required 
estimates by analysing the available data. The techniques of interpola- 
tion and extrapolation are extremely helpful in estimating the missing 
walues or projecting the future values. Interpolation, thus, refers to the 
insertion of an intermediate value in a series of items whereas extrapola- 
tion refers to projecting a value for the future. Interpolation supplies 
us with the.missing link whereas extrapolation helps in forecasting. 


Significance of Interpolation and Extrapolation 


The tools of interpolation and extrapolation are of great practical 
use. Their utility shall be clear from the following : 

1. Itoften happens that a particular type of information is being 
collected at regular intervals such as the census data. Now suppose if 
we need the population figure for, say, 1958 or 1980, it would be imprac- 
ticable to conduct a census for these years. The only alternative is to 
make use of the technique of interpolation and extrapolation. 

2. The technique of interpolation is also used where a part of 
the data is destroyed or missing. For example, we may be studying the 
figures of sales of a firm from 1957 to 1972. We may find that for a 
particular year, say 1960, data is not available. The records may either 
be missing or may be lost. Such a figure may be obtained with the help 
of interpolation. Interpolation is thus helpful in filling up the gaps in avail- 
able data. Е 

Despite the great significance of interpolation and extrapolation 
it should be noted that they give us only the most likely estimates under 
certain assumptions. The reader should not form the impression that 
the figures obtained by the techniques must be 10097 correct in practice. 
The variations between actual and estimated values are quite natural. 
For example, on the basis of census figures of 1921 to 1971 for India 
we might get population for 1981 as 700 million. However, the actual 
population as obtained by actual census taking may be different, say, 725 
million. Though the two figures differ we should remember that the 
estimate that has been made is likely to be superior to one's judgment 
or imagination of population for 1981. Thus, we can say that the inter- 
polated or extrapolated values are only best possible estimates under cer- 
tain assumptions, they are not substitutes for actual values. 

The accuracy of interpolation depends upon (1) knowledge of the 
possible fluctuations of the figures to be obtained by a general inspection 
of the fluctuations at dates for which they are given ; (2) on knowledge 
of the course of events with which the figures are connected. 
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If by looking at the data we find that the fluctuations in the series 
are regular we can expect quite accurate estimates. Moreover, if the 
investigator has a knowledge of the special factors affecting the pheno- 
mena under study he can obtain more reliable estimate by making 
an allowance for these special factors in the figures interpolated. For 
example, if we are extrapolating the figure of production of wheat for 
1975 and if we know that 1975 is likely to be a very bad year from the 
point of view of rainfall we can modify the extrapolated figure to the 
lower side to account for this special factor. 


Assumptions 


The following assumptions are made while making use of the tech- 
niques of interpolation and extrapolation :— 


l. There are no sudden jumps in the series from one period to an- 
other. While interpolating a value we always presume that there are no 
Sudden ups and downs in the data or, in other words, the data depicts 
Some sort of continuity. For example, if we are given the population 
figures for 1921, 31, 41, 5:, 61 and 71 and we are asked to interpolate 
the figure for 1968, this would be done on the assumption that through- 
out this period from 1921 —1971 there has been no violent changes in po- 
pulation. While extrapolating a value the same assumption would apply, 
ie, there is no likelihood of sudden changes in future. However, in 
many cases this assumption may not hold good and our estimates may 
be faulty. 

2. Another assumption that we make while interpolating or extra- 
polating values is that the rate of change of. ‘figures from one period to an- 
other is uniform. Thus in the above illustration our assumption would 
be that from 1921 to 1971 the growth of population has been uniform. 
This assumption again may not hold good in practice in many cases. 


METHODS OF INTERPOLATION 
roadly speaki i i i 

d pan у А peeing id various methods of interpolation can be 

‘1, Graphic Method, and 

2. Algebraic Methods. 

Under the head algebraic methods, we have several formulae. The 
following are some of the important and more popular methods :— 

1. Binomial Expansion Method. 

2  Newton's Method. 

3. Lagrange's Method. 

4. Parabolic Curve Method. 


Each one of these methods is appropriate in i i 
/ ; a cert - 
stances which are described below : P онна 


1. Graphic Method 


? ; 2 ethods of interpolation. 
this method is used the given data are plotted on a ЖАРА palet P ihe 


plotted points are joined. When there are only two v. 

a straight line otherwise a curve shall be баа On pli tam oe 
take the years and on the Y-axis the values of the variable. -For the 
period for which the value is to be interpolated a Perpendicular is drawn 


4 


“4946 to 1956 we can easily do so by drawing 
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on the line (or curve) The point where it meets the line another per- 
pendicular is drawn on the Y-axis. The corresponding value of the vari- 
able is read which is the required value. The following example shalt 
illustrate the method : 


Illustration 1. From the following data determine the population for the year 
1966. Also find out the number of people between 1946 and 1956. 


Year 1921 1931 1941 1951 1961 
Population 
(millions) 251 279 319 361 439 


Solution. The graph shows that the 
population for 1956 was 400 million and that INTERPOLATING POPULATION 
the increase in population between 1946 and | арил | — ] 
1956 was 60 million, i.e. (400—340). ги 

It is possible to interpolate graphically 400-4 
the value of the variable within a certain 
range. For example, if we are interested in 
ascertaining the increase in population frem 


two perpendiculars one from 1946 and another 
'from 1956, and then reading the difference on 
the Y-axis. Thus,for the above data, the 
increase in population between 1946 and 1956 
is 60 million as shown in the graph. 


7 1931 1941 45 1951 8% 1961 
YEARS 


Although graphic method is the simplest but there is one limitation 
ofthe method and, ie. one cannot be very accurate with the graphic 
method. The larger the volume of figures, the narrower the scale has 
to be on the graph and consequently the greater will be the error of ap- 
proximation. For example, in the above case 4” represents 100 million 
people and hence one cannot read very accurately on the graph. 


Illustration 2. The following table gives the profit of a firm for the period 1961 
to 1966. The figure for 1965 is missing. Interpolate the same by graphic paha : 


Year 1961 1962 1963 1964 1965 
Profits 
(in lakhs) 108 113 ill 110 2 


114 
(B. Com., Osmania, 1972) 
Solution, Take years on the X-axis and profit on the Y-axis and plot the given 
data. Jointhe various points. From 1965 drawa perpendicular on the curve and 
from the point where it cuts the curve draw another perpendicular on the Y-axis which 
gives the required values, i.e., 112 lakh rupees. 


INTERPOLATING PROFITS 


1953 
YEARS 


SM-E—9'71-34 


E-154 INTERPOLATION AND EXTRAPOLATION 


2. Binomial Expansion Method 


This method of interpolation is simple to understand and requires 
very little calculations. However, it is applicable only in those situations 
where the following two conditions are satisfied :— 

l. The x-variable advances by equalintervals, say 5, 10, 15, 20, 
25, etc. Ifthe increase is not uniform this method is not applicable, for 
example, if x is 5, 8, 13, 15, 24, etc., this method cannot be applied. 

2. The value of x for which y is to be interpolated is опе of the 
class limits of x series. For example, observe the followin g data : 

a 3 10 194 .20. 25 
yir. 1300 734 T 38 40 

We can determine the value of y corresponding to x=15 but not 
corresponding to x—12 or 18. The same is true for extrapolation, i.e., we 
сап extrapolate the value for x=30 and not x—28. 


When this method is applied we expand the binomial (y—1)" and 
equate it to zero. 


(y — Dn yn— nyn-t4 i 1) yra oe D gu d 

where л is the number of known values of y. 

Thus we have the following results: 
No. of known Equation for determining tne 

values unknown values 

3 or A% Js—3Y1--3y, — 90 

AU А Жж—4уз+бу,—4уу--у,=0 

55 А J5—3y4- 10y,— 10y,-I- 5y, yy-0 

65 А» Ув— Sy 5+ 15y,—20y,2-15y,— 6y,--y,—0 

TR A’ 31— T9. d-21y5—35y,4-35y4—21y,-- Ty, —y9—0 

СЪ J5—8y;--28y, — 563-70, 


—56уз--28у,—8у-+у„=0 


Illustration 3, The following table gives the quantity of Cement in thousands 


C i manufactured in India in the year Y. Find the probable production for the 


Year X 1966 1968 
Quantity Y d 1970 1972 1974 1976 
(7000 tonnes) 39 85 2 ist 2 Es 


(B. Com Madras 1977) 


Solution. Since the knowing values are five the fifth leading differences will be 


zero i.e. (y—1)5—0 or A5—0, 
Ac'=Ys—5y1+-10y3—10y2 + 5y,—y=9 
x Y 


x ү 
1966 Зо 1972 
1968 5 я Il cw з 
1970 8 у 1974 264 У. 


1976 388. Js 


2. 


У. 
We һауе ќо determine the values of y. 
substituting the above expansion. 
(1:388 (5x 264)-- (10(151)—10уе-4-5(85)—39—0 
=388—1320+-1510—10y,-+.425—39—9 
—10y2=—388+ 1320—1510—4954-39 
—10y2=—964 or J37964 
* The difference between two or moi i i 
please refer to Newton's method of Interpolation е Кунел лн кырны 
im 
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Thus the probable production of Cement for the year 1970 is 96:4 thousand 
tonnes. 

Illustration 4, The age of mothers and the average number of children per 
mother are given in the table below. 


Age of mother Average no. of Age of mother Average no. of. 
in years children born in years children born 
15—19 07 30—34 ? 
20—24 21 35—39 57 
25—29 375 40—44 5:8 


Interpolate the average number of children born per mother aged 30—34, 
M. Com., Agra, 1972) 


Solution. INTERPOLATION OF THE AVERAGE NUMBER OF 
CHILDREN BORN PER MOTHER AGED 30—34 
Age of mother No. of children Age of mother No. of children 
(in yrs.) born (in yrs.) born 
15—19 07 Уо 30—34 ? Js 
20—24 21 у 35—39 577. у 
25—29 35 Уз 40—44 5'8 Jà 


Since the known figures are five, the fifth leading difference will be zero. 
4N59—ys—5y43-10y3—10y24-5y1—y9—0 
Substituting the given values 
4N59—5'8— (5X 577) 2-10y,— (10x 3:5) -(5x 21) —0'7=0 
—5'8—28'54-10y5—354-10:5—0'7—0 
10y5—28'5—5'84-35--0'7—10'5—47'9 or y5—4'79 
Thus the expected average number of children born per mother aged 30—34is 
479 or 48 
Two or more missing values. Where two or more values are missing 
the binomial expansion method can easily be applied. When two values 
are missing in a series we get two unknown quantities in the equation 
obtained by the binomial expansion. In such a case, if we are given n 
values, we assume that the (n—1)th differences are constant, i.e., we 


assume 
n—1 n—2 n—3 


A NA BALA ..... are constant. 
Vr Уз ' Уз 
If (n—1)th differences are constant, the nth differences аге zero 
n n 
Les A =0; А =0; and so on. 
ВА Ja 
n n 
Solving А =0 andA =0 
Jı Ja 


we get the values of the two unknown. The following example shall 
illustrate the procedure : 

Illustration 5, Estimate the production for the years 1955 and 1965 with the 
help of the following table : 


Years Production Years Production 

(in '000 tonnes) (in *000 tonnes) 
1940 200 Yo 1960 350 у 
1945 220 » 1965 ? J5 
1950 260. Уз 1970 430 Ув 
1955 ? Js 


(М. Com., Agra, 1972) | 
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wn we shall assume that the ffih order diffe- 
rence eau phos аео аге two unknown figures, hence two equations 
will be required to determine them. They аге: 
AN5o—Y5—5y47-10y4—10ys--5Y1 —)0—0 and 
4N'5—J6— yg 109,—10)5--5y2—)1—0 
Substituting the values 


J5—17504-1075—26004-1100—200 —0 Ў i) 
430—5¥5-+3500—10¥3+1300—220=0 Mi) 
or Ys+10yg=3450 
—S5¥s—10y3=—5010 * 
Саус 1560 999-990 


Substituting the value of y; in the above equation 
390--10y3—3450 ог 10y,—3060 or уз —306 


Thus the missing values corresponding to 1955 and 1965 are 306 and 390 
thousand tonnes respectively. 


3. Newton's Method 
A number of formulae were given by Newton to beapplied in 
different situations. Some of these formulae are : 
| (i) Newton's Advancing Difference Method. 
(i) Newton's Gauss (Forward) Method. 
(iii) Newton's Gauss (Backward) Method. 
(iv) Newton's Divided Difference Formula. 
(i) Newton's Advancing Difference Method 
` This method is applicable in those cases where the independent 
variable x increases by equal intervals like 10, 20, 30, 40, etc. However, 
like Binomial expansion method it is not necessary here that the value of 


x for which y is to be interpolated is one of the class limits of x series. 
For example, if the given data are : 


x » 

10 100 » 
20 120 » 
30 130 Ya 
40 140 Уз 
50 140 У 


We can interpolate the value of y for x=25 or 32, etc. Similarly we 
can extrapolate the value for x—57 


The formula for interpolation is : 

с t Ee = е -— 
уед L0 Do: Das ПО 249 3h ү, 
where y, represents the value of y at origin. 


Ye represents the figure to be interpolated, A’s are the differences. 
The value of x is obtained as follows ; 


x= the value to be interpolated— The value at origin 
Difference between the two adjoining values 
If in the above example we аге to compute the value of = 
the value of x shall be obtained as follows : s picis A 


25-1015 
2010—1015 


х 


In case we are given years and the values of y variable then 


mS 
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N Year of interpolation—Year of origin 
Time difference between the two adjoining years 

While applying this method the differences between the various 
values of y are to be calculated. The differences are indicated by the 
sign A (pronounced as delta). Thus the first differences would be indicated 
by Al, second differences by A®, third differences by А?, and so on. The first 
difference in each column is called Leading Difference. The following is 
the table of differences : 

TABLE SHOWING FINITE OR ADVANCING DIFFERENCES 


x 


Differences 
] 
x ly First Second Third | Fourth 
пет Жее аа кее 

Хо | Jo 

љо | Ato |. 
xj» A m А-Л] Ло PASKA AS 

aJ 1 1-20 o 
Xe | Yo Ah-Ah] A^ T1 e AnA] Ate 

Ys—Ye | А А-Л AS 
хь | Уз Als—Ads| Д? 

У4—Уз Азз 
Xa | Ja | 


The above table clearly shows that there is relationship between the 
variows differences and the values of y. If we know the value of y we 
can calculate the differences and conversely if we know the values of the 
various differences we can find out the value of y. Thus the relationship ' 
is of the following type : 

Аһ=л—%» Д NS Лу у Уо) =Va- Wi Yo 
A= ДЛ AT — AT - (A5— AX) 
= Л. Ah At Aly (/8—J3) (а) (а) 
+(1—Yo) 
=} — Ya Ya tH V1 у БУ Роуз — 32+ 3V2—Yo 
From the above relationship it is clear that 
JoJo 3 X13 - Als 
узу А1 (у РА) J- (35 +A) — 9 - 2A! +A", 
Jo—Ys-- AS (rg -2A',- A59) (А-А) 
= (Vo 4-2A1,4- A55) J- (A5, 4- A55)- (A^ - A1) 
=y +341 +34% 4- A5, 
The numerical coefficients in the expression for Ye, Yı, Уз and уз are 
respectively as follows : 
1; ЕІ 1-4-2415 L3 3T 
These are the terms of the expansion of the following binomials 1 
(1+1)*, (1—1), (12-1? and (1+1). 
From this we can generalize as follows : 
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yy exit, Om n А? OCDE y 2-5 


This important expansion is called Newton's formula for inter- 
polation. 

Applicability. This formula should be used when the figure to be 
interpolated is in the beginning of the table. The reason is that in this 
formula we take only leading differences into account and such differences 
as we have seen are always in the beginning. 

Tllustration 6, Ораз the following pairs of БОРА xe of x and y : 


S 
Jos 73 198 573 1,198 
Estimate the value of y for x=22 
Solution, APPLYING NEWTON'S METHOD 
Differences 
x y 2 | 
First Second Third 
A! A A 
20 73 Yo 
125 | Ato 
25 198| ж 250 | A% 
375 | AY 0 Ato 
30 573 » 250 AS 
625 | Als 
35 1,198] y, | | 
A io =e Vie 
ДЬ дщ + USO) An, ang vm 22 og 


Mns coy quoda] X250 | 04(04—1)(9:4—2) 


Заа Кум О 
ie Tus a ARR i 
us the estimated value of y for x=22 is 93. 
H 


оп 7. The following are the annual i ifi 
Insurance Corporatien of India for a poli f ER NE^ by the Life 


olicy of Rs. 1,000. i c 
abit the ро policy S, Calculate the premium pay: 
ge in years : 20 25 30 35 4 
Premium (Rs) : 23 26 30 35 2 
(M, Com., Agra, 1975) 
. . Soltion. APPLYING NEWTON'S METHOD 
Differences 
Age in Premium 
years 
(x) 
20 23 | yo 
25 26 | X, 
30 30 | y», 
35 35 Js 
40 42 | » » | 
CUI. CTSNESEUIS A Rc NE Le. [qute 
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ze 2) 26—20 


Уа =YotxAtot+— == а № + №80... and х= 02 


JI IRE p eel beste Ser E 222) 
Lum 2—1)(12—2)(12—3) „ 
Fai 4х3х2 


=23+3°6+0°12+0°014=26°734 or. Rs, 2677. 
Thus the premium payable on Rs. 1,000 at the age of 26 is Rs. 267. 


Шиѕігайоп 8. From the following table find the number of students whe 
Obtained less than 45 marks: 


Marks es 40—40, 40—50 50—60 60—70 70—80 
No. of students : 31 42 51 35 31 


Q.C.W.A., 1972) 


Solution, Applying Newton's method for interpolating the number of students 
who obtained less than 45 marks. 


d: less No. of s es Due. d À 
s students j 
*) [ A! | PAS A? At 
40 31 Yo a | DE | 
° 
50 B f A | | +9 д? 
51 | Ah —25 | A% 
60 124 ya | —16 AX 437| AS 
( 35 | A's +12 | A5 | 
70 159 ys -4 A 
31 | Als 
80 190 | X | 
Ye=YorxAte + w dA Ato + „+ XD Д®+..апйх= 45—40 E 


Substituting the un 
565—1) 565—1)(5—2) 


J4577314-5(42) + axi *9t—Tx2x3 (—25) 
*565—1)05—2)05—3) 
+ 1x2x3x4 x37 


—314-21—1:125—1:56—1:45—47 865 or 48. 
Thus the number of students who obtained less than 45 marks is 48. 


lHustratien 9. Below are given the wages earned by workers per month in а 
certain factory. Calculate the number of workers earning more than Rs, 75 per month. 


Monthly Income No. of Workers Monthly Inceme No. of Workers 

upto Rs. 50 50 upto Rs. 80 500 

УЧТУУ? 150 snb КЕЧОО? 700 

b uS 40 300 «ir eo 100 800 

(B. Com., Lucknow, 1972) 
Solution. Applying Newton's method to ascertain number of workers earning 

more than Rs. 75 per month. 
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Monthly Number Differences 
Income ie 
Bs.) workers Al | A AS A^ Ae 


150 A 0| A% 
» » 70} 300] у | | +50 AX 750| Ato , 
| Ale | | —50| A3, 0 |A" 
» » 80) 500| y», | | 0, Л —50| Де 
200 As | —100| A% 
» o» 9| 700! » |. |^ L19 AS 
100 A's 


» 100] 800| y; | 


= —1)(х— 75—5 & 
revo Arg). A% + SDE A а and x213—30 Lys 
Vr5=50-+2°5(100) + 525—1 x 50 +25025002520 хо 
"n 2:(75—1) (2:5—2)(75—3) 
ESSE 2) (2'5=3) 


4x3x2 X 
"2s05-D05-205-30:5-4 o 
5х4х3х2 


¥75=50+250+93'75+0+1-95-+-0=395'7 or 396, 


"Thus the number of workers earning up to Rs. 75 is 396. The total number of © 
workers is 800, Hence the number of workers earning more than Rs. 75 is (800— 396) 


7404, 


Illustration 10. From the following data of the wages of 500 workers of a 
factory find the number of workers : 
(a) whose wages is more than 170 but not more than 200, 
(b) whose wages is less than 170 but not less than 150, 
Wages not exceeding Rs, 100=150 workers 
в» » » 150—180 „ 
» o0» » » 200—240 |, 
erg У » olen » 


» » » 
Solution, Interpolating the number of workers whose wages is above Rs, 170 
by applying Newton's method. 


Wages not No. of Differences 

(s pe Е үт Ehe OERE ен] се 
2, A A At At 
100 150 | ж» 2 
150 180 | ж +30 | A% 
200 240 |» | 760} 2ч uad КРЛ, Д VIS 
s 400 | ya ds ~60 | A5 du E 
300 500 | м +100 | Ah | 


CW 
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t $ —1)(x—2) 
VPa=Vo +Х/\%о + к Л? + ze DE 2 Ao 


звера) A 


Substituting the value of yo, Ato, /?o, in the formula, 
MIA SPI eae 
dies sek r4 (эф. 44. Т) gg 40404-07472). „л, 
2 3x2 
1°4(1°4—1) (1°4—2)(1°4—3) 
лено кру 
=150+42+8'4— 3:92 —5:15—191'33 or 191. 
(a) Thus the number of workers whose wages is more than Rs. 170—191 and 
the number of workers whose wages is more than Rs. 200—240. 
2. the number of workers whose wages is more than Rs. 170 but not more than 
Rs. 200 is (240—191)—49. 
(b) The number of workers getting not more than Rs. 170 is 191 and the number 
of workers getting not more than Rs. 150 is 180. 
г. the number of workers getting less than 170 but not less than Rs, 150 is 
(191—180)—11. 
Illustration 11. From the following data estimate the number of persons in the 
Income group of Rs. 20 to Rs. 25. 


Income Number of persons Income Number of persons 
Below Rs. 10 20 Below Rs, 40 210 
" ” 20 45 $ 55:50 325 
" » 30 115 
(B. Com., Nagpur, 1972) 
Solution, APPLYING NEWTON’S METHOD 
Differences 
Income Number 
below of 
(Rs.) Persons First Second Third Fourth 
(x) (y) 1 2 з A 
10 20 | ys. | 
25 | Ao! 
20 45 | A 45 | До 
| 70 | Ait —20 | Ло? 
30 115 | Je 25 | ДА" 15 | До 
95 | Ast =F) Dae 
40 210 | ys 20 | Ast 
115 | As! 
50 325 | y4 t 
- —1)(х—2 
уву) Ato + о-и ла wes 
Let us calculate the number of persons who are earning less than Rs. 25, 
— 25-10 _ 4.5 
in TL 
5075р *5(1°5—1)(1°5—2 
yu 20-1505) + TALSE as 180081 ) (20) 
r5(15—1(0:5—2Y0:5—3) 
Se E d 
4X3x2 745 


=20+37°5+ 16875 +1°25+0°3515=75°9765 ог 76. 
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Thus the number of persons earning less than Rs. 25 is 76 and the number of 
persons earning less than Rs. 20 is 45 (given). Hence the number of persons earning 
between Rs. 20 and Rs. 25 is (76—45)—31. 


(fi) Newton-Gauss (Forward) Method 

This method is to be used when the following conditions are 
satisfied : 

(1) When the independent variable (X) advances by equal intervals. 


(2) When the value of dependent variable (Y) is to be interpolated 
for such value of X which is in the middle. 


The formula used is : 


= Wt x Ay p 0де, 4 СА) ла. 


ie (x--1 ды 


A‘y-s 
x= Interpolation item—Preceding item 
Difference between adjoining items 


Steps: 

1. For the variable x, use Xo to denote item preceding the figure 
to be interpolated. Denote items above Xo by x—1, x—2, etc., and items 
after x, by xj, x2, еіс. In the same manner symbolise the y variable, 

2. Unlike Newton’s Method prepare a difference table. 
bols to be used in the difference table will correspond to 
Le., they will be Aly, Л, ДЗ, Afya etc. 

Mustration 12, Using Newton's Ga 


s uss formula interpolate the value of Y when 
x 10 12 14 16 
Y 25 32 40 50 
Solution: INTERPOLATING THE VALUE OF X FOR Y=13 BY NEWTON'S 
үз» Д ‘GAUSS METHOD 
Differences 
Xs Ys 7 
First Second Third 
10 х-1 25 Жет 
+7 | Aly. 
12 Xo 32 Yo "Pi "ca Д?у 1 
X 1 3y.. 
кр Уо, 0 1^ "qae Ae. | Ао 
+10 E 
16 Xa 50 Уз Am 
ны T5 Ay rk ee) 2-1 and х= TE =05 


ла=32+0'58)4-®505— „| Q**n0505-1 4 


=32+4-01 25—0'0625=35'8125 ог 35'8. 


(iii) Newton's Gauss (Backward) Formula 


This formula is to be applied when the following two conditions 
are satisfied : 
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(i) When the independent variable X advances by equal intervals. 
(ii) When the figure to be interpolated is at the end of the table. 
The formula is : 


Уау X Ala d- А1 


[^ внер, ero ne Ay 


ex 


Уо is the figure succeeding the missing figure. 
The value of the variable X succeeding to the figure 
Interpolated — The figure to be interpolated 
Difference between adjoining items 


Tilustration 13. The following table gives the population of a town during the 
last 6 censuses. Estimate the population for the year 1966. 


x= 


Year Population (000) Year Population ('000) 
1921 20 1951 39 
1931 24 1961 45 
1941 30 1971 50 


Solution, Since the figure to be interpolated is towards the end of the series 
Newton's Gauss (Backward) formula will be more appropriate. 


INTERPOLATING POPULATION FOR THE YEAR 1956 BY 
NEWTON’ S GAUSS (BACKWARD) FORMULA 


| Differences 
Population | er LES ts AM 
Year y | | 
First Second Third Fourth 
ANE At A At 
1921 20 | Xa | 
*4 | AA Р | 
1931 24 Jos +2 | AF- 
T6 [Луз d +1 | AY , 
1941 30 Уз +3 | A*y-s 7 | Afy-a 
+9 | Aly_s = —6 | A-s i 
1951 39 У-1 —3 | A?y-2 Я +8 | A-s 
+6 Aya б +2 | Aty-2 
1961 45 Jo = | Дул 
+5 Лу 
1971 50 Ва 
(x+1)x PRE 1) 
Ya=Yo—xA Yat +n Д?у-1 a a 29-2 
1951—1956 ^ 
16 05 
(0°5+1)/0'5) (0°5+1)(0°5) (0°5—1) 


J1956—45— (0:5 x6) + 2 x (—3) 6 -x2 
=45—3—1°125+0125=41 thousand. 
(iv) Newton's Method for Divided Differences 


The method is to be used when the value of the independent vari- 
able Х advances by unequal intervals. 
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The formula is 
Уе Уут (x= x) Ada (хо) 0 33) AS + (c — 99) (у) 25) ASH -- 
Axo’, A'o Añ, are the first, second and third leading divided diffe- 
rences respectively. 
Steps : 
l. Prepare a table of divided differences. The method of prepar- 
ing this table is given below : 


Divided Differences 
x Y іс оза 
First A\} Second №? Third А? 
ENT [os bei m 
Xo Jo 
утуо | Ad 
аха | | dps 
АЗ, » | | Аз Е Ao Aè 
Jc 2 NSN | 1 Xa—Xo 0 К 
Xem An Ava Ae з 
Xp E | Asl — Aj! A? X3—Xo 
Jas | ИК! Хз—Хү 1 
- nei) a 
X3 Ja | 


2. The value to be interpolated is denoted by x. 

3. The above formula is applied. 

Illustration 14, The observed values of a function are respectively 168, 120, 72 
63 at the four positions 3, 7, 9 and 10 of independent variable. What best estimate can 
you give for the value of the function at the position 6 of the independent variable ? 

Solution, Since the independent variable is advancing by unequal intervals we 
will have to use Newton's divided differences method. 


INTERPOLATING THE VALUE OF Y FOR X—6 BY THE DI Ej 
DIFFERENCES METHOD Ир 


Differences 
x Y i 
| First Second Third 
| A! A? A 
3 ху 160| yo | 
120—168 | Ao! | 
7—3 —12 | —24- (—12)| 
7 | x | 120| » З AG = "o 
72-120 | 3 ze 
ay з: MOS CE EA 
9 | xa 72| ж» —9— (—24) 10—3 Ai 
63—72 10—7 
10-7 |—9 
10 | x, 63| y 


ж=»+ Gm) A EG a RA VEG xi Gs ZO 
Yo=168-+ (6—3) x (—12)+ (6-3) (6-7) x -24 Eaa ee ® 
=18—36+6+9—147 Ó O (6-9) x1 
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The answer is the same as obtained by Lagrange's method. 
4. Lagrange's Method 
This method given by famous French Mathematician Lagrange is to- 
be applied in those cases where x series advances by unequal intervals.. 
The formula given by him is as follows : 
Dess Ке е ea) (Mg) (ume 
(хох) (хо) (хохд)... (хо Ха) 
, n (х—х)(х— х»)(х—ху)...(х—х„) 
(Q3 —x9)63 —3x3)03 — X3)... (X1—Xn) 
(хх) (x —23)(x —x9)...(x—2x») 
(ахо) (а — X1) (X2 — X3)... (X2—Xn) 
py, (х—х)(х—ху)(Х—х)...(Х—Хл-1) 
ЭУ es a) Qr а) (хаа): (а Кас) 
where ху, X1, X. etc., are the given values of x variable and yo, У, Yos etc. 
are the corresponding values of y variable ; y, is the figure to be inter- 
polated. 
If we are given four values then 
(x—x)(Qx—2x9)x—2x.) _ 
(хох) (X9 — Xs)(Xo— X3) 


Ts Ade eae c 


(x—x9)(x—2x)(x— x) 
(3x93 — x) —2x3) 
у 0а) py Germ 
* Qr Xo) (а х) (хаха) * (хао) а 1) (зь) 
Illustration 15. Determine the percentage of criminals under 35 years of age, 


T 


J'u—J»o- 


Age Percentage of criminals 
Under 25 years 520 
» 30 „ 673 
90540115, 841 
уы Oa 944 
(B. Com., Nagpur, 1974) 
Solution, Estimating the percentage of criminals under 35 years. 
Age Percentage of criminals 
Under 25 years Xo 520 Yo 
Aun Mur xi 673 л 
» 490 „ хә 841 Ya 
34859! Yer Xs 944 Уз 
Applying Lagrange's method 
(х=) (хха) (x— X9). (x—xo) (хв) (хха) 


во оп) (коха) (коха) 7 1 (Xo) (ах) (хаа) 
(x—xo(x—x)x—xs) |... y, _ (x—xo)(x—x1)(x—xa) 
TUE (Xa—xXo)(Xa—X1)(Xa—X3) 7? (хах) (X3— x1) (хах) 
Substituting the values 
(35—30)(35—40)(35—50) , f. (35—25)(35—40)(35—50) 
338—52- (55—30)(25—40)(25—50) sp ыш 
., (85—25)(35- 30)(35— 50) 4 (85—25)(35- 30)(35— 40) 
+51 029 —25)(40 - 30) (40— 50) + 944 - ($555) (50—30) (50—40) 
(5)(—5)(— 15). „a (10)(—5)(—15) 
»w-52- S -15)c25 * 673 (STOCA 
4, 010)(5)(—15) 4 (10)(5)(—5 
*8f1-35:a9)(-10) + 7*4  (25)(20)00) 
Jag —10:4-F50:48--4205 - 412—771. 
Thus there are 7741 per cent criminals under 35 years. 
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Illustration 16, к с 
bersons whose income is Rs. 19 but does not exceed Rs. 25 from the following data : 


Income in Rs. No. of persons Income in Rs. No. of persons 
1 and not exceeding 9 50 28 and not exceeding 37 406 
10 » M. 2 70 BA ж » 46 304 
Inr a Ms 28 203 


(М.А. Econ., Raj., 1972) 
Solution. Since the class intervals are not uniform throughout, we will apply the 
Lagrange's method. Estimate the number of persons getting not exceeding Rs, 25. 


Income No. of persons 
Income not eXceeding 9 Xo 50 Jo 
[LS » 19 x 120 » 
» » » 28 Xo 323 Ye 
» » э 37 Хз 729 Js 
sk i» » 46 X4 1,033 va 
Here x=25 


cy (x—x1) (x— x9) (хха) (хха) 
2 Ко) (оа) (933) (X9—34) 
(х—хо) (x2) (хха) (x—x4) 


Un 9639) 656—389) (x1 — xa) (X1 =) 
X» (x39) (х) (x75) (x—x4) 
(ахо) (X531) (аха) (хач) 
(о) (хх) (хха) (x—x,) 
(ахо) (x4 3x) (аха) (х—хз) 
Substituting the values, we get 
559/2519) (25—28) (25—37) (25—46) (25—9)(25—28)(25—37)(25—46) 
5219) (9—28) (9—37) (9—46) +120 (19—9) (19—28) (1937) (19—46) 
3: +729 _(25— 9)(25—1 9)(25—28) (25—46) 
(28—9) (28—19)(28—37) (28—46) (87—9)(37—19)(37—28)(37—46) 
+1033 (259) (25—19) (25—28) (25—37) 
(46—9) (46—19) (46—28) (46—37) 
(+16) (—3)(—12)(—21) 
PO ЕТУ СЫ CC. 27) 


EON | (+16) (+6) (—3)(—21) 
CI 49)(—9)(—18) +72 Ce Co GSC) 


(о) x23) (x— x9) (x—34) 
8 ахо) (x1) (ra xa) бух) 


Hence the number of persons 


5. Parabolic Curve Method 
; This method of 

о universa] application, i.e., it can be applied to all type: 

interpolation. When this method js pons one ЫЫ кеши и 

dependent and another аз independent. 


y—a-tbx-E cxt Gd ni пх", 
This isa curve of the nth order. The or, 

. der 
depend upon the number of known items in the Series, "AP M F^ pem 
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Estimate by a suitable method of interpolation the number of 


| 
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less than the number of known items. For example, if the known items 
are four we would take a curve of 3rd order, і.е:, 
y=a+bx-+ ex? 4- dx?. 

Note. This method of interpolation is also known as the method of 
Simultaneous equations for the reason that a number of equations have to 
be solved simultaneously. 

This method is not very popular because when the number of known 
items exceeds four the calculations become too lengthy and hence time 
consuming. 


Illustration 17. The following table gives the cube of the values. Find out the 
cube of 5 by applying parabolic curve method. 
Size of items : 3 4 5 6 7 
Cube value: 27 64 ? 216 343 
Solution, As there are 4 known values, we fit a parabola of the third order, 
‘The equation of the parabola would be 
Y=a+bx+cx*+de 


we have to determine the values of a, b, c and d 
Taking deviation of x (i.e., size of items) from 5 


х= r2. = 0 +1 +2 
у= 27 64 Уо 216 343 
Yo is the figure to be interpolated, Substituting the values in the equation, we get 
27=a—2b+4c—8d [0] 
64=a—b+c—d EX] 
yo=C (й) 
216=а+Ь+с+4 (iv) 
343=a+2b+4c+8 oes (v) 


Adding (ii) and (iv) 64=a—b+c—d 
216=a+b+c+d 


280—2a--2b (и) 


Adding (i) and (v) 27—a-—2b--4c — 8d 
343=a+2b+4c+8d 


370—2a--8c (ий) 

Solving (vi) and (vii) 280—2a--2c Multiplying by 4 
370—2a- 8c 
1120=8а+8с 
370—2a--8c 


750—6a or a—125' 


Thus the cube of 5 is 125. i 
Illustration 18, From the following data of the population of a city in lakhs, find 
out the population for 1956 by the Parabolic Curve Method : 


Year | 1941 1951 1961 1971 


Population in lakhs | 18 22 25 30 


= Solution, Since the known values аге four we would fita parabola of the 3rd 
order. 
ji y=a+bx+ex*+ de? 
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We һауе to determine the values of а, Б, с and d. Denoting years by x and 
population by у and taking the origin 1956—0 we get 
х=—15, — S 0, +4; +15 
y=18, 22, Yo, 25, 30 
here yo is the value to be interpolated. 
The calculations can further be simplified if we take 5 as common unit for x. 
if that is done 
х=—3, =], 0, +1, +3 
у=18, 22, Yo 25, 30 
Substituting the value of x and y in the equation 
y—a- bx 4-cx*--dx?, we get 


18—a—3b--9c—274d EO! 
22=a—b+c—d (їй) 
Yo=a «++ (iii) 
25=atb+c+d (v) 
when x—--3, 30—a--3b4-9c4-27d (v) 


The value of a would give us the desired result. 
Adding equations (ii) and (iv), 
E 22—a—b-r-c—d 
25=a+b+c+d 


47=2а+2с (и) 


Adding equations (i) and (v), 
da 18—a—3b--9c—27d 
30— a-4-3b--9c--37d 


48—2a--18c „e (vii) 
Multiplying equation (vi) by 9 and deducting equation (vii) from it, 
428—18a4-18c -- (viii) 
48—2a--18c 


162—075 or a—23'4 

Thus the population for the year 1956—23:4 lakhs. 

This method of interpolation is not very popular because the work 
becomes tedious and the calculations take a long time as the number of 
known values increases. However, it has the advantage of easy adap- 
tability to all types of problems. 

EXTRAPOLATION 

As pointed out earlier, extrapolation refers to estimating a value 
for future period. In order to extrapolate a particular value the various 
methods discussed above for interpolation can be adopted. The choice 
of a particular method would depend upon: (a) requirement of the 
question, and (Б) the nature of the given data. 

The following example shall illustrate the procedure for extra- 
polation : 

Mlustration 19. Extrapolate the sales of steel for 1975 from the following data : 


Years 1958 . 1955 1960 1965 1970 

Sales (in tonnes) : 251 279 319 361 439 

Solution, Д APPLYING BINOMIAL EXPANSION METHOD 

Years Sales Years Sales "Ф 
(m. tonnes) (m. tonnes) 

1950 251 ж 1965 361 ^ 

1955 279 у 1970 439 Ja 

1960 319 y, 1975 1» 


(у—1)*-= 
Y5—S¥e+10¥3—10¥2+5¥i—Yo=0 
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Substituting the values : 
25= (5x439)-- (10x 361) —(10x 319) -+(5x279)—(251) =0 
=2195—3610+3190—1395+251=631 
Thus the expected sales for the year 1975—631 m. tonnes, 
MISCELLANEOUS ILLUSTRATIONS 


Mlustration 20. Interpolate the missing figure from the following table with the 
help of a suitable formula : 


1961 1331 1965 3375 
1962 1729 1966 4096 
1963 2197 1967 4913 
1964 ? 


(B. Com., Nagpur, 1972) 


Solution. Applying binomial expansion method to determine the unknown 
value. Since the known values are six, the sixth leading differences will be zero. 


(y—1)*—0 or Ao*=0, where 
Ao=Ye—6)'5 + 15y4—20y34- 15y2—6y14-y9—0 


We aie given 

1961 1331 Je 1965 335 уц 
1962 1729 Jı 1966 4096 ys 
1963 2197 Уз 1967 4913 Je 
1964 ? Js 


4913 -6(4096) +15(3375) +20y5+15(2197)—6(1729)+1331=0 
4913—24576--50625—20y3--32955—103744-1331—0 
—20y,— —4913--24576— 50 625— 329554-10374—1331 
—20)y,— —54874 or уз = 27437 

Thus the missing value corresponding to the year 1964 is 27437 

Illustration 21. From the data given below estimate the number of persons 
living between the age of 35 and 42: 

Age (years) : 20 30 40 50 

Мо, of persons living : 513 439 346 243 
(M. Com., Nagpur, 1974) 

Solation: Estimating number of persons living between 35 and 42. 


Differences 
Age (years) | No. of persons 
living 
A as | AY 
20 513 | 
Jo 274 дь 7 
30 49 | ж» +19 | A% | 
40 м6 | $e d is +10 | д? EDU 
6 >. 1 
тло Аа | 
50 243 Ja | 
zB —1)(x— 
Ja=Yo HA ze к Ato 5 zDD Ar, 
Number of persons living at 35 
х= 5—01; 


х9 


T5(r5—1) r5(r5—1)75—2) 
о MIS + 


Yag=513-+1°5(—74) + Les 


513—111+7°125+0°5625=409°6875 or 410 
SM-E—9777.35 Р 
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Number of persons living at 42 


xB 22 
2'2(2'2—1) 2'2(2°2—1)(2:2— 2) x 
J41—5134-22(—74) 2 X19 4- — 3x2 -9 


7:513—162:84-25:08—0'79—29728 or 297 

Number of persons living at 42—297 

Number of persons living at 35—410 

Number of persons living between 35 and 42—410—297—113. 

Illustration 22. If Le represents the numbers living at age x ina life table, 
interpolate by using Newton's method / for the value of x—35. 

La9—572, ^ La9—439, —L49—346, [50=243, 
(M.A. Econ., Punjab, 1974) 

Solution. INTERPOLATING THE VALUE FOR x— 35 BY NEWTON' S METHOD 


Differences 
Age X No. of persons = 3 
8 First Second Third 
At | р A 
=} 
20 572 | Yo | | 
=133 | Ao | | ч 
30 49 | у, | T40 | Ao’ | 
93 Ait 50. Ao 
40 346 Уз ‘Ga As —10 Ax 
50 243 Уз d | 
HEA iF E ACNE + х(х— DEN) Ag А 
35—20. .. 
N 
yu 572-1 s(—133)4- TIESI хдр 


Ail sees 2) y —50 


=572—199'5+15+3°125=390'625 or 391 


Illustration 23, Use Newton's formula and interpolate ti igrati: 
Town А in the year 1963 from the fellowing ао hom 


Years 1962 1964 1 
Migration from 299 1968 1970 
Town 4 233 391 582 799 1,035 
A .4. Ес 
Solution, — — APPLYING NEWTON'S METHOD ^ 22075 Punjab, 1972) 
Differences 
Migration E 
е; from Town А [m Sind 1 VOX 
At A ur us 
19620 233 | у 
+158 1 
1964 | 391 | уу Ao 433 | Ag 
+191 | Agi - | Act 
1966 | 582 | yz WM ON 
+217 | Ast | ж АКЕ 0 
1968 | 79 | ys ot Ad 1 
+236 | Ast 
1970 1,035 | yi 
| 
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Yama + aA HED Aga ЕО Ag 
= = —3) 
+ x(x na d 3 Av 
1228-192 es 
Ee 5(с5—1) 5050052) 
Ула —2334-5(158)+ SV зз 2305-0032), у 
5(Сс5—1)С5—2)С5—3) 
+732 ei 


=233+79—4'125—0°437+-0=307'438 ог 307 

Thus in 1963 three hundred and seven persons migrated from town A. 

Illustration 24. The observed values of a function are respectively 168, 120, 72 
and 63 at the four positions 3, 7, 9, and 10 of the independent variable. What is the best 
estimate you can give for the value of the function at the position 6 of the independent 


variable, (M. Com., Nagpur, 1972) 
Solution, Estimation of the value of the function at the position 6 of the inde- 
pendent variable. 
Independent variable (x) 3 xo 7х1 9 X» 10x, 
Values of the function (Y) 168 Yo 120 уу 72 y» 63 ys 


Applying Lagrange's formula : 


(хх) (хха) (хха) _ (x—xo)(x—xe) (x—xa) 


©те (Xo—X1) (Xo— X2) (xo—x3) +1 (х1—х0) (x1—x2) (x1—x3) 
ya 0000) (531) (x xg) (x—xo) (x —x1) (x —x9) 
#(ха—хо)(ха—х1)(ха—хз) 7? (ха—хо)(хә—Х)(Ха—Хз) 


(6—7)(6—9)(6—10) +120 (6—3) (6—9) (6—10) 
(3—7) (3—9) (3—10) (7—3) (7—9) (7—10) 
(6—3) (6—7) (6—10) (6—3) (6—7) (6—9) 
+72 (9—3у(9—7)(9—10) + 9 (0—3)(10—7)(10—9) 
(—1)(—3)(—4) (33(—3)(—4) (3)(—1)(—4) 
yemle e c-r) 120 Gast = (6)(2)(—1) 
(3)(—1)(—3) 


¥e=168 


=12+180—72+27=147. 
Illustratien 25. Estimate the number of students who get more than 48 but not 
more than 50 marks from the following data : 
Marks up to 45 50 55 60 65 
No. of students 447 484 505 511 514 
(M.A. Econ., Jabalpur, 1975) 
Solution, By applying Newton's method we shall beable to find the desired 


4 figure : 
Differences 
No. of 
Marks Students 
A A 
Up to 45 447 Jo 
52:50 484 До? 
л R жале um 
55 505 | у 1 +11 в 
5 ; „|+2| As 
‚ 60 511 Уз Ax 
» 65 514 | н 
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х(х—1) х(х—1)(х—2) Кук. 
1x2 Ae + IX2x3 At; x 


166—1) *606—1)06—2) _ 
er 16 + 6 


48—45 _,. 
5 =06 


VYa=YotxAo! + 


J4874474-6x 37+ x1 


*606—1 6—2) (6—3) хи 


* 24 
=447+22'2+1°92+'056—"369=470'807 or 471 
Number of students getting up to 50 marks=484 
Number of students getting up to 48 marks=471 
Number of students getting more than 48, but not more than 50 marks 
=484—471=13 


Шиѕітаіоп 26. Using Newton's method of interpolation, estimate from tbe 
following data, the number of workers earning Rs. 24 or more but less than Rs. 25 : 


Raming less than 20 25 30 35 40 
Number of workers 296 599 804 918 966 
(M.A. Econ., Meerut, 1976) 


. Solution. By applying Newton's method determine first. the number of workers 
earning less than Rs. 24. 


Bae No. of Differences 
arning 0. о) 
(Rs.) Workers + 27 NM mS E x: 
AS (esp AF ns еа At 
es 5.5 » - qu 52 f 
Less than 20| 296 Yo | | 
+303"! Де! | 
#15-25 | S99 ay Tans Ке —98 де | 
ah +7 è 
зо тво ya | -91 |да | а | Ay 
+114 | Ast | | +235 | Ay? 
m 35. 918 | ys 166 0да! 
+ 48 | 
» ov 40) 966 | э ed | | 


mYebxAg + XD доз а х(х—1)(х—2) 1. 24—20 4 
Ya=Yotx Ao} + 1х2 Aw + 1x233 Да x= Saye 08 


Yeu=29648x303 + SCBD y Log 4 3O-D0-2 y, 


18 
7296--2424-- T:84--0:224 —0317— 546147 or 546 
Number of workers earning less than Rs. 25—599 
” » » » » oo» Rs.24—546 
<. number of workers earning between Rs, 24 and Rs, 25= (599—546) — 53. 


Illustration 27. The following data relates ^ i A 
production (Base Triennium ending 1961-62—100 yis 10 the index numb er of agriculture 


Year 1971—72 INB 195— N 
Index 130:9 1204 Ir NL ора Pe 


Use binomial expansion method to estimate the index for 1974—75. 
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Solution. Since known valuss are four, the fourth leading differences will be 


zero i.e. (y—1)*—0 or /\„*=0, 


Ao! —Yi—AYs-F 2-41 o 0 


Year Index 
1971—72 1309 Jo 
1972—73 1204 » 
1973—74 1333 у 


Substituting the values 


Year Index 
1974—75 = 
1975—76 148'6 


148°6—4y3+ 6 x 133:3—4 x 12074-1309 —0 
—Ay3——14876—19984-481:6—13079 


—4y42—5977 or y3=149"4 


Js 
va 
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The term vital statistics* signifies either the data or the methods 
applied in the analysis of the data which provide à description of the vital 
events occurring in given communities. By vital events we mean such 
events of human life as births, deaths, Sickness, marriage, divorce, adop- 
tion, legitimations, recognitions, separations, etc. ; in short, all the events 
which have to do with an individual’s entrance into or departure from 
life together with the changes in civil status which may occur to him 
during his lifetime. 

Vital statistics have to do with people rather than things and, con-' 
sequently, this branch of statistics has perhaps the second oldest history 
in the world, surpassed in antiquity only by the closely related population 
census. The population census which is the most fundamental and far- 
reaching statistical inquiry that can be undertaken, provides a picture 
of the population and ifs characteristics at one moment of time ; vital 
statistics provide the tools for measuring the dynamics of or the changes 
which continuously occur in this instantaneous picture. 

Uses of Vital Statistics 

Vital statistics are extremely useful and their significance can be 

judged from the following angles : 

. Use to individual 

Use to operating agencies 

Use in research—demographic and medical 
Use in public administration 

- International use of vital statistics. 

(1) Use to Individual. Records of birth, death, marriage and divorce 
as well as those of legitimation, recognition, adoptions, and so forth, are 
of paramount use to the individual. The basic registration document or a 
certified copy thereof has legal significance to the person concerned, which. 
is equalled by few of the other documents а man may acquire in his 
lifetime. 

à (2) Use to Operating Agencies. Records of births, deaths and mar- 
nages are useful to governmental agencies for a variety of administrative 
purposes. For example, the control programmes for infectious diseases 


within the family and within the community often depend on the death 
registration report for their initiation. Public health 
post-natal care for the mother and child usually have th 


комюр 


( l statistics are indispensable in demo- 
graphie research. The study of population movement and of the inter- 


* “Vital statistics forms perha; i j 

$ statis Ps the most important branch of statistic it 
deals with mankind in the aggregate. It is the science of numbers applied to the Dit 
EER communities and natjons,”—Arthur Newsholme: The Elements ef Vital 
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so as the advances in technology and public health focus attention on 
demographic problems. The three directions which such an analysis 
take are : (1) population estimation, (2) population projection, and (3) 
analytical studies. 

Very closely allied to the role of vital statistics in demographic 
research is their use by the medical profession engaged in research. Medi- 
cal and pharmaceutical research, like demographic research, requires a 
certain number of guideposts. This guidance may be found in part at 
least in mortality and natality statistics. 

(4) Use in Public Administration. Statistics in general and vital 
statistics in particular are fundamental elements in public administration, 
which is the machinery and methods underlying all official programmes 
of economic and social development in either “developed” or *under- 
developed’ areas. The role of vital statistics in over-all planning and 
evaluation of economic and social development is the most important use 
to which this body of data may be placed. 

(5) International Use of Vital Statistics, Vital statistics are also 
useful from the international viewpoint. Only by a sufficiently wide survey 
of human facts can the required norms of all sorts be established, norms 
which represent the characters of the great unit constituted by the aggre- 
gation of all the nations. 

However, it should be noted that vital statistics, like all statistics and 
multiple vital records, are not ends in themselves but tools for the study 
and understanding of other phenomena. 

Methods of obtaining Vital Statistics 

There are three methods of obtaining vital statistics : 

1. Registration Method 

2. Census Enumeration 

3. Analytical Method—Estimation of vital rates using census data. 

1. Registration Method. The Registration method is the corner- 
stone of vital statistics. It may be defined as the continuous and per- 
manent compulsory recording of the occurrence and the characteristics of 
vital events, primarily for their value as legal documents and secondarily 
for their usefulness as a source of statistics. In most countries there is 
a system of registering the occurrence of every important vital event 
under legal requirements. For example, when a child is born, the matter 
has to be reported to the proper authorities, together with such informa- 
tion as the age of mother, religion of parents, etc. Similarly, when a 
man dies, the death is to be recorded with appropriate authorities and a 
certificate is, to be obtained before the body is cremated. 

Continuous permanent recording of vital events can best be ensured 
by means of legislation which makes registration compulsory. Such legis- 
lation should also provide sanctions for the enforcement of the obligation. 
Thus it will be seen that the registration method is characterised not 
only by the continuous character of the observations but also by the 
compulsory nature of the method. Both provisions are fundamental. Re- 
gistration of vital events for legal purposes is an almost universal require- 
ment. 

Data on births and deaths can also be obtained from the hospital 
record. 
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2. Census Enumeration. In most countries of the world popula- 
tion census is undertaken generally at ten years interval. A census is an 
enumeration at a specified time of individuals inhabiting a specified 
area, during which particulars are collected regarding age, sex, marital 
Status, occupation, religion, etc. 

The fundamental deficiency of the census method for collecting vital 
statistics is that it can, at best, produce returns for the census year and 
Bo other. Census years are usually ten years apart. For the intercensal 
years, current vital statistics are not produced by the census method, and 
thus, that method fails in the first and minimum requisite for vital statis- 
tics, i.e., the production of data on a current basis. Not only does the 
census method fail to provide intercensal data but it fails also to record 
completely the occurrence of births and deaths even for the census year. 

Periodic surveys have been employed to secure ad hoc information 
ол births and deaths in areas where the registration method has not been 
established or where it is very defective. In such situations, Survey has 
the distinct advantages of making available some vital Statistics not other- 
wise obtainable and of Securing at the same time the corresponding 
population, 

3. Analytical Method— Estimation of Vital Rates using Census Data. 
It it is assumed that the derivation of birth, death and marriage rates is 
the object of collecting vital Statistics, then there is still another method 
which could be employed to yield these basic facts. This method is a 


riod. This indirect method yields aggregates only and that too solely 
г the year of the Census, [t does not, therefore, Justify its consideration 
as a method of developing vital statistics which by definition must be 
current and continuous. : However, it is a method Which has been deve- 


(Ze., age and sex) corresponding to different ро; i 

>» Points of time 
Tegisters we have data regarding the number of bi 
Ting during different periods. 5 atk M d 


PePer(—D)--Q— E) 
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where P,—total population at a point of time 
P,=total population recorded at last census 
B=total number of births during the given period 
D=total number of deaths during the given period 

d I—total number of immigrants 

£=total number of emigrants 


Measurement of Fertility* 
In order to study the speed at which the population is increasing, 
fertility rates are used which are of various types. Important amongst 

these are : 
Crude Birth Rate. It is the simplest method of measuring fertility. 
It acts as an index of the relative speed at which additions are being made 
to the population through child-birth. In this method the number of 
births are related to the total population. Since it is only a live birth that 
) signifies an addition to the existing population, live births alone are 
considered in measuring fertility, thus excluding still births. 


The annual crude birth rate is defined as : 


annual births _ x 1,000. 
annual mean population 


In this measure the births are related to the mean population and 
not to the population at a particular date. The crude birth rate of a 
given year tells us at what rate births have augmented the population over 
the course of the year. 

The crude birth rate usually lies between 10 and 55 per 1,000. The 
level of the crude birth rate is determined by : 

(i) The sex and age distribution of the population ; and 

(ii) The fertility of the population, i.e., the average rate of child- 
bearing of females. 

A relatively high crude birth rate can be recorded if the sex and age 
distribution is favourable even though fertility is low, i.e., countries with a 
relatively large proportion of population in the 15-50 years аре group will 
have a relatively high crude birth rate, other things being equal. 


Specific Fertility Rate. The concept of specific fertility arises out 
of the fact that fertility is affected by a number of factors such as age, 
marriage, State or region, urban-rural characteristics, etc. When fertility 
rate is calculated on the basis of age distribution, it is called the 
age-specific fertility rate. While calculating age-specific fertility rate 
women of different ages in the child-bearing age are placed in small age 
groups so as to put them at par with others of child-bearing capacity. The 
fertility of women differs from age to age and, therefore, the grouping of 
women of different ages is essential. The capacity to bear children is 
much higher in the age-group 20 to 25 than in the age-group 40 to 45. 


Е 


Crude birth rate— 


* [n demography the term ‘fertility’ refers to the actual production of children. 
Fertility must be distinguished from fecundity which refers to the capacity to bear 


children, Fecundity sets an upper limit to fertility. 
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Number of live births which occurred to females 

of a specified age-group of the population ofa 

given geographic area during a given year 000 

Mid-year female population of the specified age- x ' 

group in the given geographic area during the 

same year А 
General Fertility Rate. This rate refers to the proportion of the 

number of children born per 1,000 of females, the reproductive or child- 

bearing age. Thus the numerator of this rate would remain the same as 

the crude rate, but the denominator would be limited to the age-sex group 

of the population able to contribute to the birth rate. The formula for 

such a rate is : 


S.F.R.— 


Number of live births which occurred among the 
General population of a given geographic area during a 
fertilit ..BEiVen year —— —————ÀÀ — ——— x1,000 
ES (GE R ) Mid-year female population of ages 15 to 49jin ^ " 

NAY the given geographic area during the same year 

. The computation of the G.F.R. requires that a decision be taken 
beforehand as to which years of the life a woman should be included in 
the child-bearing period. Although the practice varies in this respect, 
generally the child-bearing age is taken 15 to 50 years. Births to mothers 
under 15 and above 50 are so rare that they are not recorded separately 
but are included in the age-group 15 and 49 respectively. 

The G.F.R. shows how much the women in child-bearing ages have 
added to the existing population through births. It takes into account the 
sex-composition of the population and also the age-composition to a 
certain extent. Yet it is calculated without proper regard to the age- 
composition of the female population in child-bearing ages. The fecundity 
of women differs according to age-groups. In our country it is low in the 
age-group 15—19 but it increases very rapidly in the age-group 20—24 
and only slightly in 25—29 after which it gradually declines. In U.S.A. 
fecundity reaches its peak in the age-group 20—24 and thereafter declines. 
For this reason even if the general fertility rate of two populations may 
correspond to each other, we cannot assume that the fertility rate is really 
identical unless different age-groups are also taken into consideration 
in its calculation. 

Total Fertility Rate. In order to measure correctly the population 
growth we calculate the number of children born per thousand females in 
the child-bearing age divided into different age-groups. This leads to the 
total fertility rate which is calculated by adding up the specific fertility 
rates belonging to different age groups. 

The total fertility rate is the mean number of children which a female 
aged !5 can expect to bear if she lives until at least the age of 50, 


а ғ ry measure of the fertili itions 
operating in that area during that period. na 


In order to calculate the total fertility 
35 specific fertility rates and then add them. In practice we can shorten 


this procedure by working in quinquennial age groups. W 
specific fertility rate for group x years and uir rir as E ERU 


Tate we shall have to calculate 
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Annual births to females aged x and 

I I T . wunder(x--5) — ae, 

Specific Fertility Rate—4r. number of females aged x and x 13000 
under (x+5) 

Such a specific fertility rate is the rate per 1,000 per annum at which 
the females in the particular age-group produce offspring. 

If we add the quinquenn'al specific fertility rates and multiply by 5, 
we shall have the total number of children which 1,000 females aged 15 . 
will bear over their lifetimes. A calculation based on quinquennial age- 
groups involves only one-fifth of the arithmetic of one based on single age 
groups and is very nearly as accurate. Symbolically T.F.R.=2 S.F. 
Rates xi 

where i=the magnitude of the age class 

_ _ Limitations of Fertility Rate. The fertility rates are unsuitable for 

giving an idea of the rate of population growth because they ignore the 
sex of the newly born children and their mortality. If the majority of 
births are those of boys the population is bound to decrease while the 
reverse will be the case if the majority of births are girls. Similarly, if 
mortality is ignored a correct idea of the rate of growth of population 
cannot be formed because it is possible a number of female children may 
die before reaching the child-bearing age. For measyring the rate of 
growth of population we calculate the reproduction rates. Reproduction 
rates are of two types : 

1. Gross Reproduction Rate, and 

2. Net Reproduction Rate. 


Gross Reproduction Rate. The gross reproduction rate is the sum of 
age-specific fertility rates calculated from female births for each single year 
of age. It shows the rate at which mothers would be replaced by daughters 
and the old generation by the new if no mother died or migrated before 
reaching the upper limits of the child-bearing age, i.e., 49 years. Another 
underlying assumption is that the same fertility rate continued to be in 
operation. If the gross reproduction rate of a population is exactly 1, it 
indicates that the sex under consideration is exactly replacing itself ; if it is- 
less than 1, the population would decline, no matter how low the death 
rate may be and if it is more than 1, the population would increase, no 
matter how low the death rate may be. The gross reproduction rate is 
computed by the following formula : 

Number of female births er 

G.R.R.— тш mmber oF births 1081 Fertility Rate 

Also СЕВ. No. of female children born to 1,000 women 


The G.R.R. is used as a measure of the fertility in a population. It 
is useful for comparing fertility in different areas or in the same area at 
different time periods. The G.R.R. could in theory range from 0 to 
about 5. 

Gross reproduction rate has an advantage over the total fertility rate 
because in its computation we take into account only the female babies 
who are the future mothers whereas in the total fertility rate we include 
both male and female babies that are born. 


E-167 VITAL STATISTICS 


An important limitation of the gross reproduction rate is that it 
ignores the current mortality. All the girls born do not survive till they 
reach the child-bearing age. Hence the gross reproduction rate is 
misleading in that it inflates the number of potential mothers. This defect 
is removed by computing the net reproduction rate. 

The accuracy of gross reproduction rate depends on the accuracy 
with which age-specific fertility rates can be computed. The principal 
sources of error are : (1) under-registration of births, (2) mis-statements or 
inadequate statements of the age of mother at registration, and (3) errors 
in enumeration or estimates of the female population by age-groups. 

Net Reproduction Rate. Though gross reproduction rate gives an 
idea about the growth of population, it excludes the effect of mortality 
on the birth rate. The rate estimates the average number of daughters 
that would be produced by women throughout their lifetime if they were 
NC at each age to the fertility and mortality rates on which the 
calculation is based. It thus indicates the rate at which the number of 
female births would eventually grow per generation if the same fertility 
and mortality rates remained in operation. A net reproduction rate of 1 
indicates that on the basis of the current fertility and female mortality, 
the present female generation is exactly maintaining itself. Both fertility 


population growth. Thus the net reproduction rate represents the rate 
of replenishment of that population, 

The net reproduction rate is Obtained by multiplyi 

i t plying the female 
Specific fertility rates of each age by the proportion of female survivors 


to that age in a life table and adding up thi d j 
thus made for mortality. Symbolically : Аы аПожапвә de 


: 49 
N.R.R.— sp, L 
15 &, 
where b; represents female births per person at each age x, 


= number of years lived at each age per woman to 


% the original group of females 
49 


rion of these rates for reproductive span of life 
taken from 15 to 49 years of age. 
or N.R.R.— 2 (No. of female births XSurvival Rate) 
d ; 1,000 
€ net reproduction rate in theory can range from 0 to about 3 
Net reproduction rate cannot exceed gross-reproduction rate for the 
ES Herodoti eduction rate adjusted for the effects of mortality is called the 


2. "N.R.R. measures the extent to which i 
J R.R. I mothers produce female infants who 
survive to replace them...it measures the extent to which A generation of girl babies 


survive to reproduce themselves as they pass through the child-bearing age-group.” —- 
Benjamin. 
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reason that іп its calculation we also take into account mortality. Both the . 
rates will be equal when all the newly born daughters reached the child- 
bearing age and passed through it. 


If the net reproduction rate is exactly one, it indicates that on the 
basis of current fertility and mortality rates, a group of newly born 
females will exactly replace itself in the new generation, i.e., the population 
will be constant. If the net reproduction rate is below one, it indicates 
a declining population and if it is more than one, the population has a 
tendency to increase. f 


However, even the net reproduction rate as a measure of replace- 
ment of population cannot be much relied upon because of the following 
two reasons : 

l. It assumes constant rates of fertility and mortality over a genera- 
tion. In actual life both these rates go on changing. ү 


2. The population of a country may become depleted more by 
migration than by falling birth rate or the country may receive fresh. 
stock of immigrants who might be more virile. 


In using gross and net reproduction rates as a means of analysing 
the applications of observed fertility and mortality rates for future popu- 
lation development, it should be borne in mind that the age specific ferti- 
lity and mortality rates recorded in a given country at a given time do 
not actually represent the experience of any real generatiom of women, 
and that may be influenced by factors which àre by their nature neces- 
sarily temporary. 

Illustration 1. Compute the specific fertility rate, general fertility rate, total 
fertility rate and gross reproduction rate from the data given below : 


Age group No. of women No. of live 
( dea (000) births 
15—19 Л 25 800 
20—24 20 2,400 
25—29 18 2,000 
30—34 15 1,500 
35—39 12 500 
40—44 6 120 
45—49 4 10 
5 т Total 100 7,330 


It is given that out of 7,330 the number of female births was 4,000. 
Solution. Computation of specific fertility rate : 


S.F.R. (for age 15—19 years) —35999 1000-370 
S.F.R. (for age 20—24 years) = 2500.3 1,000—1200 
S.F.R. (for age 24—29 years) = 1209 х 1,000=111°1 
F.R. (for age 30—34 years) = 1200 х1,000—100:0 
S.F.R. (for age yt 15,000 ^1” 
500 


S.F.R. (for age 35—39 years) — 12,000 X 1,000—41'7 
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Е.А, (for age 40—44 years) 2, x1,000—20*0 
Б.В. (for age 45—49 years) = od x1,000—2:5 


No. of live births 
No. of women of 15—49 years 


It is clear from above that for 15—19 age group S.F.R. is 32. Accordingly 1,000 
females exactly aged 15 would by the time they reached 20 have borne 32x5=160 
chidren. Itis necessary to multiply by 5 since the specific fertility rate is a rate per 
annum and by the time the females reach the age of 20 they will have spent 5 years in 
the age group 15—19, 


In the table below is shown the number of births which 1,000 females will have 
borne by the time they reach certain ages. 


G.F.R.— х1000= 


Exact age S.F,R.x5 Total births 
(years) Per 1,000 females aged 
15 by stated ages 
15 0 0 
20 32x5=160°0 160 
25 120х 5=600°0 760 
30 111:1 х5=555:5 1,316 
35 100 x5=500°0 1,816 
40 41°7 X5=208°5 2,025 
45 20x5—100'0 2,125 
50 255x5—175 2,138 
" ZSRR. 2138. ^. 
Total fertility rate — 1000 ~= 1000 2:138 
__No. of female births ar 24,000 n. Т 
ORR oa iis X Total fertility тае= 77330 х2°138=1°17 


Illustration 2, Calculate the gross and net reproduction rates from the data 
given below. 


Number of women in age groups and number of female children born in one 


year, 
Age group Female Female Survival 
Population births rate 
(000) 

15—19 1,600 19,000 0:921 
20—24 1,000 70,200 0:901 
25—29 1,685 90,600 0'885 
30—34 1,730 62,400 0:862 
35—39 1,725 32,500 0:850 
40—44 1,629 11,000 0:832 
45—49 1,510 800 0:812 


Solution, Gross reproduction rate, i.e., number of female chil 
woman as She passes through child-bearing “age, is sum i = nc eater ete оде 


woman for child-bearing ages. But when data is i iti 
sum of specific fertility rate per woman (femal ке аны. Уеа, і. із equal to tbe 


S.F.R. per woman —— Female births — 
Female population 
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The calculations are shown below: / 
Female Female Specific fertility 
Age population births rate per woman 
„Е.В. 
19,000 > 
15—19 16,00,000 19,000 16,00,000 00119 
70200 — |. 
20—24 10,00,000 70,200 10,00,0000 0702 
т. 90.600 _ 9, 
25—29 16,85,000 90,600 1 6,85,000 9 0538 
pes 62,400 _ 9, 
30—34 17,33,000 62,400 17,30,000 00361 
S 32.500 
ed 2 Af 
35—39 17,25,000 32,500 17,25,000 0°0188 
11,000 — 
40—44 16,20,000 11,000 16,20,000 9 0068 
45—49 15,10,000 800 £00 __0:0005 
EN 15,10,000 


ZS.R.F.—0:1981 


G.R.R. — X[S.F.R.] x 5—0:1981 х 5—0:9905. 


Net Reproduction Rate. Net reproduction rate is the number. of female children 
surviving till their reproductive ages bornto one woman às she passes through child- 
bearing ages. Thus itis the sum of the specific fertility rate per woman for all ages x 
survival rates. For data expressed in 5 yearly age group itis the sum of the specific 
fertility rate per woman for various groups Xsurvival ratesx5. The calculations are 


shown below : 


Specific 
Female Female fertility Survival 
Age group population births rate per rate 

woman 

(S.F.R.) S S.F.R.xS 
[52-1 16,00,000 19,000 00119 0:921 00110 
20—24 10,00,000 70,200 070702 0:901 070633 
25—29 16,85,000 90,600 0°0538 0'885 0'0476 
30—34 17,30,000 62,400 070361 0:862 0*0311 
35—39 17,25,000 32,500 070188 0:850 070160. 
40—44 16,20,000 11,000 0:0068 0:832 070057 
45—49 15,10,000 800 0:0005 0:812 070004 


XIS.F-R. х 5]—071751 


N.R.R.—J/S.F.R. xS] x 5—0:1751 x5-0'8755 
Illustration 3. From the figures given below calculate the General Fertility 
Rate and the Total Fertility Rate : 


Age group Мо. of Specific Age group No. of Specific 
women fertility rate women fertility rate 
(per 1,000) (per 1,000) 
15—20 100 15 35—40 100 80 
20—25 120 100 40—45 50 50 
25—30 110 120 45—50 70 10 
30—35 105 140 
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Solution, | CALCULATION OF GENERAL FERTILITY RATE AND TOTAL 


й FERTILITY RATE BE o. 
Age group No. of women гере А: №. Y рз 
‚15—20 100 15 o = 15 
20—25 120 100 DS. 210 
25—30 110 120 10X120 з? 
30—35 105 140 4035140, 
35—40 100 80 -= 80 
40—45 80 50 40 3 
45—50 70 10 = 07 | 
685 575 ; 541 E | 
| 


— Total No. of children born 

GERS Total No. of women < 1»000 | 
54° 
= x 1,000=78°98 per thousand 

Total Fertility Rate= ZS.F.R. xi=575 x 5—2,875 per thousand 
: Wlustration 4. From the following figures calculate 
tion rate if the ratio of male and female children be 48 : 52 | 
Age group No. of children Age group No. of children | 
born to 0 women born to 1,000 women 


the female gross reproduc- ) 


35—39 80 | 
20 24 180 40—44 40 | 
25—29 200 45—49 10 j 
А 30—34 150 
Solution, CALCULATION OF GROSS REPRODUCTION RATE 
a Age group No. of children born to v No. of. “female children wer 
1,000 women born 
15—19 0 50х52 _„.. 
5 100 260 
0094 180x52 —— 
180 ЕЗҮ | ПУ =93'6 
25—29 x 200x352 _ 
200 100 1040 
30—54 430x52 ag: 
150 100 780 
35—39 0 80x52 
8 100 41 
40—44 40x52 — 
40 č A100, 5 208 
45—49 OX 
10 100 = 52 
Oe 


VITAL STATISTICS 


E-1612 


G.R.R,— Total female births to 1,000 women _ 3692 


1,000 


[It is assumed that the number of children born (given) are the total childrem 
born during the time 1,000 women passed through the age group, i.e., in years.] 


Шиѕітайеп 5. Calculate the net reproduction rate from the following data : 


71,000 =0°3692 per woman 


Age gronp No. of No. of. Age group No. of No. of 
of child- female survivors of child- female survivors 
bearing children out of each bearing children out of each 
females Бот to 1,000 1,000 females females born to 1,000 1,000 female 
women passing children women passing children 
through each through each 
age group age group 
15—20 50 850 35—40 300 650 
20—25 180 800 40—45 100 600 
25—30 450 750 45—50 40 500 
30—35 500 700 
(7.р.С. Final Com. Raj., 1972) 
Solation. CALCULATION OF NET REPRODUCTION RATE 
Age group No. of female Мо, of survivors No. of survivors 
children born to which replaced 
1,000 women present women 
50X850  ,.. 
15—20 50 850 71000 745 
180x800 = 
20—25 180 800 1,000 =144'0 
450750 . 
25—30 450 750 —1000 — -—335 
500x700 Я 
30—35 500 700 1000 — = 3500 
300x 650 
35—40 300 650 1,000 1950 
100x600  ... 
40—45 100 600 7,000 =600 
40х500 " 
45—50 40 500 1,000 =200 
ў 1,620 us 1,1490 
NRR. Z(No. of female births X Survival rate) — 1,149 =1'149 


1,000 


1,000 


Illustration 6. From the data given below calculate the gross reproduction rate 


and net reproduction rate: 
Age group 


No. of children born to 


Mortality rate 


1,000 women passing through 
the age group 
150 : 120 
1,500 180 
2,000 150 
800 200 
500 220 
200 230 
100 250 


Sex ratio being males : females 52 : 48. 


SM-E—9'77-36 
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Solution. - CALCULATION ОЕ G.R.R. and N.R.R. 
cw йй, No. of female Survival rate No. of female 
quen RE M л (4,000— children survived 
passing through (45%) Mortality rates) 
the age group $ 
150x48 . 72x880 _ 63:36 
16—20 150 100 =72 880 —1,000 6 
^ 1,500x48 . 720x820 =590-40 
21—25 1,500 100 =720 820 1,000 5 
2,000%48 960x850 _ 21 <.99 
26 —30 2,000 - 100 =960 850 1,000 16 
800x48 . 384x800 _ 557.99 
31—35 800 10055 —384 800 —1000- 30 
500x48 240780 _ 87:0 
36—40 500 — 10 ^40 780 1,000 1872 
200 x48 96x770 . 3:92 
41—45 200 100 = 96 770 00 =? 
75 y 
46—50 о в 750 А8050 3600 
=" 100 " а 11 E 
EAE. A ТУ. =2,520 207408 _ 
GRR= Total No. of female children born 250 =2:52 per woman 
» E 
N.R.R.= N°- of female children born and survived to 1,000 women 
UA 1,000 
2,074'08 y. 
=",000 ` 2:074 per woman. 


Measurement of Mortality 
The following rates are used for measuring mortality. 
Crude Death Rate 
The annual crude death rate is defined— 


e Annual Deaths 
> Crude Death Rate Annual Mean Population 


The crude death rate for a given year tells us at what rate deaths have 
depleted the population over the course of the year. We can calculate 
the crude death rate for males and females separately. 

The crude death rate usually lies between 8 and 30 per 1,000. The 
female rate is generally lower than the malerate. In most countries crude 
death rates have fallen substantially over the past half century or so. 

The level of the crude death rate is determined by: 

(i) the sex and age distribution of population ; and 


(i) the mortality of the population, i.e., average longevity of the 
population. 


An old population can exhibit a relatively high crude death rate 
even if longevity is high (i.e., mortality is low). 


The crude death rate, which measures the decrease in a population 


due to deaths, is perhaps the most widely used of any vital statistics rate. 
This is so for two reasons : 


x 1,000 
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(1) It is relatively easy to compute. 

(2) It has value as an index in numerous demographic and public 
health problems. 

However, death-rate so computed is likely to be misleading especially 
when it is required to compare the death rates in two areas or in two 
occupations. It is because of the fact that mortality varies with sex age 
whereas the crude death rate marks all age differentials. It assumes 
that age-sex structures of the populations being compared are the same. 
However, in practice it is notso. Populations composed of a high pro- 
portion of persons at the older ages where mortality is higher will naturally 
show a higher crude death rate than a ‘younger’ population. 

. The crude death rate may be used for comparing the mortality 
situations of the same place at different times, provided the periods com- 
pared are not too far apart, because in a stable, large community the 
age and sex compositions of the population change very slowly. If the 
time trend is studied for a long period of years the effect of population 
changes must be examined. Greater caution is necessary for comparison 
between areas, since rather significant differences in crude death rates 
may arise entirely from differences in the age-sex distribution of the: 
populations. 

However, where it is known that the population distributions are 
approximately similar, or where the crude rate differences are large, as 
in many international comparisons, the crude rate has great value as an 
index of mortality. 

Specific Death Rates. By themselves, crude death rates are not 
enough for a detailed study of the mortality conditions in а community. 
We often need to know more about deaths occurring in different sections 
of the population. For instance, people interested in infant or child 
welfare work study deaths taking place under 1 year of age or in such age 
groups as 1—4 years, 5—9 years, etc. Those interested in maternal 
health study how many deaths occurred among women of child-bearing 
age. Insurance companies are interested in deaths occurring at different 
ages of the population. , 
The formula for computing specific death rate is : 

Number of deaths which occurred among a 
specific age group of the population of a 
Annual death rate given geographic area during a given year x 1,000 
specific for age Mid-year population of the specified age ^ * 
group in the given geographic area during 
the same year. 

These rates measure the risk of dying in each of the age groups 
Selected for the computation. Usually such rates are computed for the 
entire span of years, and are further specified by sex, so that rates specific 
for age and sex areavailable. The specificity by age and sex eliminates 
the differences which would be due to variation in population composition 
in respect of these characteristics, and to this extent, such rates can be 
compared from one geographic area to another and from one time period 
to another. However, it does not eliminate other variables which also 
may be important, such as “occupation”, “literacy”, and the like. Never- 
theless, for general analytical purposes, the death rate Specific for 
age and sex is one of the most important and widely applicable types 
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of death rates. It also supplies one of the essential components required 
for computation of net reproduction rates and life tables. 

Standardized Death Rate. The criticism of the crude death rate 
is that while making inter-area comparisons it fails to take account of 
differences in the age (or age-sex) structure of the populations in question 
and, thus, fails to reveal the "real" mortality. It has been suggested, 
therefore, that the crude rates be "adjusted" to allow for the known 
differences in the age composition of the population involved. Several 
methods have been proposed and different names have been applied to 
the results. Some workers have called these hypothetical indices “айјиѕ- 
ted rates", others "standardized rates", still others "corrected rates". 
Perhaps the most appropriate term is “adjusted rates", used with a prefix 
to identify the basis of the adjustment as, for example, “age-adjusted 
death rate", and so forth. 

^' There are two principal methods of age adjustment : (1) the direct 
method ; and (2) the indirect method. 

Direct Method. Direct method gives the death rate that would occur 
in some standard population if it had the mortality of the given communi- 

‚ ty, or death rate that would occur in the community if its population were 
distributed as that of the standard. The direct method of adjusting for age 
consists of weighing the specific rates not by the population of the area 
to which they refer as is implied in the computation of the crude rate, 
but by the population distribution of another area, chosen as a standard. 
In the direct method, the rates specific for one geographic area are mul- 
tiplied by the corresponding populations of another area which, for this 
purpose, is considered as a 'standard'. The resulting expected number 
foreach age group is summed, and the total is divided by the total 
standard population to obtain an age-adjusted “‘rate’’.* 

The obvious defect of the direct method as a means of adjusting for 
the age of differences of several populations is that it entails the choice 
of a ‘standard’ population. The choice of this ‘standard’ will naturally 
affect the magnitude of resulting adjusted rates and may change their 
relative positions with respect to eachother. However, in eliminating 
bias on a national basis, it is customary to use the total population of. the 
country as 'standard' for adjusting the rates of the regions within the 
country. There is no generally accepted standard population for inter- 
national comparisons. 

Indirect Method. The indirect method adjusts the crude death rate of 
the community by applying to it a factor measuring the relative “mortality 
proneness" of the population of the community. 

In this method the “standard” is a set of specific rates, rather than a 
population distributed by age. To compute an age-adjusted rate by the 
indirect method, one requires the population of the area distributed by age. 
These given populations are multiplied age by age by the "standard" age 
specific rates to obtain the expected number of events in the standard area 
if it were subject. to the given age distribution. The sum of these 

expected events" divided by the population in the area under consi- 
deration gives an ‘expected rate’ or ‘index rate’ in the “standard area"— 
one which is dependent solely on the sex-age constitution of the population 
and as a rule may be treated with sufficient accuracy as remaining constant 


* Handbook of Vital Statistical Methods—Statistical Office of the United Nations. 
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over a period of years adjacent to the experience period. The “index 
rate" is customarily divided into the crude rate of the “standard” area 
and the resulting ratio is known as an “adjustment factor" which can be 
used to adjust the crude rate of the area under consideration. Both the 
direct and indirect methods of age adjustments have been criticized on the 
ground that the rates obtained are dependent on the age and sex structure 
of the standard population used and that greater gains in mortality 
reduction obtained at younger age are not adequately accounted for. 


Illustration 7. Compute the crude and standardized death rates of the two 
populations A and B from the following data : 


A B 
Age-group 
(years) 
Population Deaths Population Deaths 
j| ET LS 9 

Below 5 15,000 360 40,000 1,000 
5—30 20,000 400 52,000 1,040 

Above 30 10,000 280 8,000 240 
Total 45,000 1,040 1,00,000 2,280 


(B. Com., Andhra, 1972) 


Solution, Crude Death Rate = 1000, where N=No. of deaths, P—Population 


P 
___ 1,040 P 
C.D.R. for town A = 35,000 x 1,000=23"1 1 
= 2280 _ : 
C.D.R. for town —1,00,000 x 1,000=22°80 
Standardized death rate taking population of town Aas standard population 
A B 
Witt fe. ce 
Age-group | popylation| Deaths Death | Population | Deaths} 
(years) Rate per 
thousand 
Below 5 15,000 360 24 40,000 1,000 
5—30 20,000 400 20 52,000 1,040 
Above 30 10,000 280 28 8,000 240 
Total 45,000 | 1,040 | 1,00,000 2,280 
* 
: E 4) + (20000 x 20) + (10000 x 28) 
Standardized Death Rate (town A) (20 i eee рдо - 
zi 3600004-400000--280000 1040000 2311 
gr 45000 2 45000 
000 x 25) + (20000 x 20) + (10000 x 30) 
Standardized Death Rate (town B)= Ip Заа 


375000--400000--300000 _ 1075000 —23:89 
Ж 45000 ~ 45000 


E-1617 VITAL STATISTICS 


We càn now say that the death rate in town B is higher than town A. 

i Another method of computing standardized death rate is to take some assumed 
population (that is the population of neither town A nor B) as standard. But this 
method is not so popular. 

Illustration 8. From the following data of the two towns А and B which 
would you consider to be more healthy (assume town B as standard) : 


Town A Town В 
Age Population Deaths Population Deaths. 
0—15 10,000 200 15,000 370 
15—50 18,000 500 20,000 600 
50 & above 2,000 50 5,000 100 
Solution. CALCULATION OF STANDARDIZED DEATH RATE 
| Town A Town B 
Age ү, =з Р - 
Population | Deaths | Death Rate| Population | Deaths DEO Rare 
за н us | жй 
| | 
0—15 | 10,000 200 20.0 15,000 | 370 | 24 
15—50 18,000 500 | 27:8 20,000 600 30 
50 & above | 2,000 50 250 5,000 | 100 | 20 
MH 2 | Є А. 
Total 40,000 | 750 | 40,000 1070 | 
N — 1070 ХЕ 
S.D.R. of town В P ms 1000 — 4000 x1000—2675 


This is also the standardized death rate of town B because town B is teken as 
standard. 


S.D.R. of town A= (15,000 x 20) + (20,000 x 278) + (5,000 x 25) 


15,000--20,000-- 5,000 
I= 300,000+556,000+ 125,000 — 981,000 24:52 
40,000 ~ 40000 ~ 


Since the S.D.R. of town A is less than that of town B, hence town A is more 
Anfant{Mortality Rate. Infant mortality rates serve as one of the 
best indices'to the general *healthiness" of a society. It is similar to age- 
specific death rate}for infants under 1 year of age. It is defined as : 
Number of deaths under 1 year of age which 
occurred „among the population of a given 
Infant . Beographic area during a given year 10 
Mortality Rate Number of live births which occurred among ^ M 
the population of the given geographic area 
during the same year 


The rate approximately measures for a given year the chances of a birth 
failing to survive one year of life. Still-births are not included in the infant 
deaths. The rate can be calculated for males and females separately. 

The infant mortality rate varies considerably according to time and 
place. In countries with high standards of maternal and infant welfare it 
is as low as 15 to 20 per 1,000 but in some underdeveloped countries it is 
still well over 100 per 1,000. In many countries it has fallen spectacularly 


over the past sixty years or so. The male rate is а iably higher than 
the female rate. ee idm 
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The infant mortality rate is of great value in the field of public 
health and its correct computation and interpretation is important. In 
most countries, the great risk of death at ages under 1 is not equalled 
again in the life span until very old age is reached. But in contrast to 
death at older ages, infant deaths are more responsive to improvement in 
environmental and medical conditions. 

Neo-Natal Mortality Rate. The neo-natal mortality rate, like the 
infant mortality rate, is similar to an age specific rate. It is a rate used to 
measure the risk of death during the first month of life. This rate is 
defined as : 

Annual deaths of infants under the age of 
1 month among the population of a given 
Annual Neo-Morta-_ geographic area — — x 1000 
lity rate =Number of live births which occurred 
among the population of a given geo- 
graphic area during the same year 


The rate measures for a given year the chance of a birth failure to survive 
one month of life. Most infant deaths occur within the first month of 
life. The neo-natal mortality rate represents to a very large extent hard 
core of infant mortality. Of the neo-natal deaths more occur within the 
first week of life. 

In interpreting neo-natal rates care must be taken to evaluate the 
probable effect of under-registration of live births in relation to infant 
deaths. It is likely that infant deaths, under 1 month, are registered less 
completely than апу other infant deaths and the two sources of incom- 
pleteness in the rate probably compensate for each other to some extent. 

Maternal Mortality Rate. The risk of dying from causes associated 
with child-birth is measured by the maternal mortality rate. For this 
purpose the deaths used in the numerator are those arising from puerperal 
causes, i.e., deliveries and complications of pregnancy, child-birth and the 
puerperium. 

The numbers exposed to the risk of dying from puerperal causes 
are women who have been pregnant during the period. Their number 
being unknown the number of live births is used as the conventional base 
for computing comparable maternal mortality rates. The formula is : 

Number of deaths from puerperal causes 

occurred among the female population of a 
Annual maternal given geographic area during a given year x 1000 

mortality rate ^ Number of live births which occurred 

among the population ofthe given geo- 

graphic area during the same year 
The classification and coding of deaths as puerperal deaths vary from one 
country to another or even within the same country and hence we must be 
cautious in comparing maternal mortality rates for different places. 


Natural Increase Rate. The crude birth rate reveals the proportion 
by: which the population is increased through the addition of new 
members. The crude death rate measures the “toll” of the same popu- 
lation. Rates of natural increase or decrease, that is, rates computed on 
the balance of births and deaths, give some measure of the over-all gain ог 
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loss in a population through the addition of births and' the subtraction of 
deaths. 

The annual rate of natural increase can be computed simply by 
subtracting the crude death rate from the crude birth rate. The 
formula for such a rate is as follows : 

Number of births which occurred among a 
population of a given geographic area 
during a given year minus the corresponding 
Annual natural number of deaths x 1000 
increase rate Mid-year population of the given geographic 
area during the same year 
The rate can also be determined by subtracting the crude death rate from 
the crude birth rate. 

Rate of Net Migration. The annual rate of net migration is defined : 
Annual] net migration x 1000 
Annual mean population 
... Overseas arrivals —overseas departure 

Annual mean population 


The rate of net migration for a given year tells us at what rate net 
migration has augmented the population over the course of the year. 
LIFE TABLES 


For public health purposes, the force of mortality ina population is 
usually measured by means of such indices as crude death rate, infant 
mortality rate, specific death rate at different ages by sex, etc. Another effec- 
tive and at the same time comprehensive method of describing mortality 
in a population is by means of life tables. A life table is composed of 
Several sets of values showing how a group of infants, all supposed to be 
born at the same time and experiencing unchanging mortality conditions 
would gradually die out. In other words, the life table is a convenient 
method for summarizing the mortality experience of any population 
group—that is, it provides concise measures of the longevity of that popula- 
tion. Such tables are usually worked out after each decennial census to 
represent mortality conditions either during the previous decennium or 
during shorter periods covering the date of the census. Separate tables 
for males and females are usually prepared. For detailed study it is not 
uncommon to construct tables for each geographical subdivision of a 
country or different population segments. A life table can also be con- 
Structed to show how а group of babies would die if, hypothetically, one 
or more causes of death were eliminated. In recent years life table 
techniques are being increasingly applied to follow up studies of chronic 
diseases or hospital patients. 


Uses of Life Table 


Rate of net migration— 


x 1000 


assurance offices. Tt forms the basis for calculation of premiums necessary 
to various amounts of life assurance. It provides insurance companies 


in the event of death. The economic soundness of this scientific а roach 
is testified by the gigantic buildings owned by insurance сорай all over 
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the globe. Today life table is widely accepted as important basic 
material in demographic and public health studies. William Farr called 
the life table the “Biometer” of the population. Some of the specific 
uses of life tables are : 

1. The preparation of population projections by age and sex ; 

2. analysis of effects of mortality on the age and sex composition of 
a population ; 

3. comparisons of summarizing measures o 
table death rate to (the reciprocal of the expecta’ 
expectation of life at various ages, etc ; 

4. computation of net reproduction rates ; and 

5. the appraisal of the accuracy of census enumerations and vital 
registration data. a 

In addition, life table techniques have been applied to the analysis 
of other types of demographic data ; for example, in the computation of 
probability of marriage, specific age and sex, on the basis of cengus data 
classified by marital status. 

To the student of population, 


f mortality, as the life- 
tion of life at births), 


life tables are a valuable instrument. 
They enable him to study population growth and forecast the size and 
distribution of populations at a future date, under certain assumptions. 
With the help of a life table, demographers have devised measures such as 
“net reproduction rate" to describe the true rate at which a population is 
increasing. Other examples of the use of life tables are : 

1. Estimation of the size of future labour force ; 

2. forecast of the school-going population in connec 
building programmes ; 

3. estimation of the probable number of future orphans ina 
community. 

4. computation of the probable number of future widows ; and 

5. study of trends in age distribution of a population. 

However, the accuracy and usefulness of life tables depends mainly 
upon the accuracy and completeness of the registration of deaths and of 
the enumeration of the population at the census. Deficiences in death 
registration. are likely to be greater than in census enumeration. 

The accuracy and the international comparability of life table values 
are particularly suspect at the higher ages. Differences in the methods 
used for constructing life-table (adjustment of data graduation, etc.) may 
affect the reliability of the results and impair their international com- 
parability. The effect of such differences is, however, probably much 
smaller than that of deficiencies in censuses and in death registration. 
Construction of a Life Table 

It is used to construct a life table showing survival and death 
occurring in а generation of 1,00,000 babies. We ask what would happen 
if these 1,00,000 babies born at the same time were subjected to those mor- 
tality influences at various ages that are found to be influencing the popu- 
lation at a certain period of time. How many of these 1,00,000 
would live to see their first birthday, how many their second birthday, 
how many their third birthday, and so on, till the last one ? On the basis 
of the mortality rate operating at the time under study, we can estimate 
what number would be alive at the first birthday by applying the mortality 


tion with school 


. 
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rate during the first year on the 1,00,000 children. By the application 
of the mortality rate of the second year on the number surviving at the 
end of the first year, we estimate the number who would survive at the 
end of the second year of life. Similarly for other ages. These numbers 
of survival to various ages form the basic data set out in a life table. 
From these we can also calculate the average life time a person can expect 


to live after any age. The life table containing 8 columns is shown 
below : 


LIFE TABLE 
Age Livingat Dying be- Mortality Survival Living be- Living Mean after 
agex tween ages x Rate Rate tween age x above life time at 
and x4- 0 and х+1 age x ape x 
x lg dy ГА Ра Ly T; е 
1 2 3 4 5 6 7 8 
0 1,402,759 27,124 *19000 81000 1,23,143 46,32,557 32°45 
1 1,15,635 7,059 "06104 93896 1,11,899 45,09,4144 39°00 
2 1,08,576 3,900 03648 96352 1,06,591 43,97,525 40:50 
3 1,04,616 2,610 *02495 97505 1,03,392 42,90,934 4102 
4 1,02,006 2,006 *01967 98033 1,01,121 41,87,542 41°05 
5 1,00,000 1,710 "01710 "98290 99,145 40,86,420 40°86 
6 98,298 1,591 “11620 "98380 97,494 39,87,275 4067 
7. 96,698 1,483 *01534 “98466 95,957 38,89,781 4023 
8 95,215 1,383 01452 "98548 94,523 37,93,824 39°84 
9 93,832 1,293 01378 "98622 93,186 36,99,301 39°42 
n 92 539 1,210 `01308 `98692 91,934 36,006,115 3897 
i i 1 E Н i E 
95 2 9 40957 59043 16 32 1-52 
96 12 6 "42932 "57068 9 16 1°34 
dons. e an E EE. 
р 75295 2 0°67 
99 = 1 1 49176 *50823 M . i 


I. The first. column denoted by "&' gives the exact vea: HA 
starting from 0, 1, 2,.........*--Up to 99. Р е озая years of age 


ed number of births 1, (called the Cohort or 
the number 1,42,759 in the h 


3. The third column а, 


ives th 
persons reaching age x whe dio g e number of persons among the 7; 


before reaching x--1. Thus the 


һ=„— 


Thus for the life table given above, 
d,—(1 42759 —115635)—27124 ; and 


Жр: d gives the mortality rates to 
wou е exposed throughout their lives. 
t : › ghout the 

It should be noted b s s not the same as the age-specific 
Tom death registration records. For the life table 
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death rate the denominator is the number Of people alive at exact age X 
and the numerator the number dying between age x and x41. It follows 


that = 
@ 
Thus for the life table given above corresponding to x=0 
27124 
a= у € DU LE hf j56357 06104. ete 


_ The fifth column entitled ‘P,’ gives the probability that a person of 
precise age X will survive till his next birthday. Since a person must either 
live or die in a particular year of life q,--p.—1 and pe=(1—qz). Thus 
for the table given on page E-16:21 for x—0, Po=(1—"19000) ; for х=1, 
pi—(1—:06104) —:93896, etc. 

The sixth column entitled ‘Le’ gives the number of years lived in the 
aggregate by the cohort of l persons between age х and x41. The Le 
column gives the distribution of the life table stationary population. For 
finding L, the following formula is used : 

L,—L,—M. 

Thus for age 5, L,—100000 —3 х 1710—99145 

This on the assumption that deaths for each year of age are evenly 
spread throughout the year. 

La can also be calculated as follows : L,— Iib. 

The seventh column entitled T; gives the number of years lived by: 
the group from the age x until all of them die. Thus 


To= Let Lont Lent.. L, or Т = АЫ; 
Thus for the specimen table 
7,—123143--111889--10659-F1. ...- --164-9--54-2.— 4632551 


8. The last column entitled ‘e? measures the average number of 
years a person of a given age X can be expected to live under the prevail- 


ing mortality conditions. The expectation of life at age х is obtained 
from the following relation : 


etu T 
4086420 
== BU e a 
Thus for x—5,  €'5—-100000 40:86 
for х==5; du =1°52, etc. 


Details of procedure in life table construction vary somewhat de- 
pending in large measure upon the availability and form of basic data. 
The basic data ordinarily consist of : 

(1) deaths in a given period classified according to аре and. 
sex, and 

(2) the mean population of each age and sex during that period. 

Illustration 9, Fill in the blanks is the following life table: 


L а, Р. q; L; T; 
O° 762227 T ge ce ы 27296732 


3i 158580 
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Solution: For x=30, 
da=lz—le+1=762227—758580=3647 


— le+1 _ 758580 
E das 162227 


4a—1—pa—1—0:995—0:005 
ja Ны Ka 622a KISAS 260404 
Tatı =T —Lz—27296732—7160404—26536328 


Illustration 10, Fill in the blanks of the following skeleton life table which are 
marked with question marks : 


=0'995 


a 


Agen _ [^ dy qa Pe Lz T, ets 
9 ^" 93832 1293 ? ? ? 3699301 39°41 
10 4 1 1210 — - — E ih 
Solution : 


Ii*—19—4$—93832—1293—92539. ; I= = 
z 


= Ф __1293 y Аре. 
49 ly = 793832 =0°01378 ; р„=1 s 


Pi-1— 01378098622 ; Le Ire 


Lye phi, 98832 P9253 м рр 


Ть=Тә— 1.у=3699301—93185°5=3606115°5 уез= Te 


3699301 T 36061155 
» dy —3942 : = 51 4 238 
4 е% 93832 3942 ; e, Ly > $3539 38°97 
Thus the table after completing the figures is as follows : 


Age ly СА СА Pz L, Ty e, 
9 93832 1293 0137 98602 93195-5 3942 
410°; 92539 1210 ps = ARR En 


E 36061155 3897 


MISCELI ANEOUS ILLUSTRATIONS 
Mlustration 11. Which of the two pl i i is gi M 
PS opinion dE heal? о places for which mortality data is given below, 


Locality A Locality B 
Age group E Wa um E 
Standard Population Local population 
Population Deaths Population Deaths 

Under 5 4,500 135 4,000 144 
5—15 10,000 40 10,500 63 
15—65 12,500 75 13,500 81 
Above 65 3,000 140 2,000 102 
Total 30,000 390 30,000 390 


(B. Com., Madras, 1972) 
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COMPUTATION OF STANDARDIZED DEATH RATE 


Solution, 
Locality A Locality B 
Age — = 
Population | Deaths | Death Кае Population Deaths | Death Rate 
Under 5 4,500 | 135 30 40,000 144 3'6 
5—14 10,000 | 40 4 10,500 63 6 
15—65 12,500 | 75 6 13,500 81 6 
Above 65 3,000 | 140 | 140/3 2,000 102 51 
30,000 | 390 | 390 


CDR. of locality A= x 1000 
This is also the standardized death rate of Locality A as Locality A has been taken as 


standard. f 
S.D.R. of Locality В : _ (4500x3 6) + (10000 X + (129008) + (3000 x 51) 
304200 


_ 16200+4-60000+4-15000+ 153000 _ 
17 30000 = — 30000 


390 s 
730000 x1000—13 per thousand 


71014 


Since the standardized death rate of Locality Bis less than Locality 4 it can be con- 


cluded that Locality В is healthier to A. 
Illustration 12. From the following table о 
the standard and local population compute : 
(i) The general unemployment rate for the standardized population 


(ii) Standardized unemployment rate for the local population 


f population and unemploymentefor 


(iil) The crude unemployment rate for the local population А 
‘Standardized Population Local Population 
Age Population Unemployment Population Unemployment 
Озю 250 5% 3,000 m 
30—45 5,500 3% 3,000 Ж 
45—60 3,000 12% 3,500 12% 
60 and above 1, 15% 500 20% 
10,000 10,000 
(B. Com., Bombay, 1975) 
Solution. CALCULATION OF THE VARIOUS UNEMPLOYMENT RATES 
t 7 Local Population 
ae Standard Population Rate of Ton Aion E 
5 Total Unemploy- Рори lat- Unemploy- Total Unemploy- 
rs iar ud M of | ment ion ment по. of ment 
unemployed SITUE unemployed 
5—30 e A quu O. Е оно 
15—39 250 РА 280 80 3,000 9% 270 90 
45—60 3,000 12% 360 120 3,500 12% 420 120 
60 & above 1,000 15% 150 150 500 20% 100 | 
10,000 915 10,000 910 


acm t pe erac С 
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(i) General Unemployment Rate for the Standard Population 

915 — 
10000 x1000—9r5 
(ii) Standard Unemployment Rate for Local Population 


N 
General Unemployment Rate — PX 1000— 


Standardised Unemployment Rate 
_ (2500 x40) + (350090) + (3000 x 120) -+ (1000 x 200) 


2500+3500+3000+-1000 
100000-+315000 +360000++-200000_9.75,000_ 97.5 
E 10000 ~ 10000 
(ii) Crude Unemployment Rate for Local Population 
EN S = 910 : 
Crude Unemployment Rate— P х 1000— 10000 * 1000—91 


Illustration 13. From the data given below, calculate the gross and net repro 
duction rates : 


Age group Female Female Survival 
Population Live-births factor 
(in thousands) 
15—19 1,399 15,133 “9660 
20—24 1,422 99,155 "9668 
25—29 1,521 102,676 *9632 
30—34 1,756 72,490 “9584 
3539» 0 1,451 31,402 *9519 
40—44 1,689 10,640 “9424 
45—49 1,667 700 


"9279 
(B. Com., Bombay, 1976) 
Solution, CALCULATION OF GROSS AND NET REPRODUCTION RATES 


Age Female Female Specific S FxS 
Population Births Fertility 
(in thousands) 
Ру B, F=B;+P; 

15—19 1,399 15,133 701082 796 n 
20—24 1,422 99,155 "06621 "3668 "06401 
25—29 1,521 102,676 06751 "9632 "06563 
30—34 1,756 72,490 “04128 "9584 "03596 
32-99 1,451 31,402 02164 "9519 “02060 
40—44 1,689 10,640 "00630 9424 "00594 
45—49 1,667 700 "00042 "9279 "00039 
ZP,—10905 УВ;=327196 ZF—0721418 ZFxS-—0'20602 

49 49 
влд.=2(-” 


Р, EP "21418 x5=1'0709 


N.R.R.—-X S)x5-Zz| Fxs = 
^к. 2L xS)xs- x р "2060: жы E 
2 ( Py ) 15 20602 x 5=1'0301 


Since these rates are greater than 1, the po 
Tilustration 14. 
the data given below : 
ea eon g child 
ring females 15—19 20—24 25—29 30—34 35— 0—44 
No of women (°000) 160 164 15:8 152 ae ЕТ) ds 
Total births 260 2244 1894 1320 916 280 145 
Assume that the proportion of female births is 46:2 Per cent. 


(M. Com., Delhi, 1977) 


pulation is increasing. 
Compute general fertility ratr and gross reproduction rate from 
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Solution : ES 


GFR= No. of births 


No of women of 15—49 years 
Total births=7,059, No. of women=1,07,700 
7059 


x 1000 


GFR =- 107700 * 1000=65:54 
... No, of female births EP 
G.R.R. rear births’ ^7 X Total fertility rate 
У! 
Total fertility rate= IE 


S.F.R. (for age 15-19 years)= ong 29. 1000-1625 


S.F.R. (for age 20-24 years) = 2 x 1000—13683 


S F.R. (for age 25-29 years) = 15506. x 1000—119:87 


х 1320 

S.F.R. (for age 30-34 years) — 15200 x1000— 86°84 

S.F R. (for age 35-39 years) тууу Х 10С0= 61°89 

S.F.R. (for age 40-44 years) =; уу Х1000= 18°67 
X years) "15000 


145 ii 
S.F.R. (for age 45-49 years) = jig 1000= 10°00 
SS.ER.X5749035x59225175; TFR. =- 752,252 
No. of female births=7059 x 46°2=3261" 26 


326126 
G.R.R.— 7059 х2:252=1'04 


Illustration 15. Fill in the blanks in the following skeleton 1 
marked with question marks : 


ife table which are 


А L T, [2] 

2 etas Bu i 7° 3508026  ? 

21 690673 -— — — = = 1 
Solution : 


а„=1„—1е+4=693435—690673=2762. 


_ de _ 2162 _-оозов; p,—-1—q2—1— 00398— 99602 
da=], — 693435 —'00398 ; Pz qz 
Dune rum dun 
2 2 
 L,—35081126— 692054—34389072 


Т, Tao _35081126_ 
0-19 о ү? = aas 7 =50:59 


Ta 34389072479 
e'u- = 7690673 n 


Illustration 16. Estimate the standardized death rates for the following two 
countries : 


= “ 


E-1627 : VITAL STATISTICS 


Age group , Death Rate per 1000 : Standardised 
(in years) Country 1 Country IT Population 
(in lakhs) 
0— 4 20°00 5°00 100 
5—14 100 0'50 200 
15—24 140 1°00 190 
25—34 2:00 1:00 180 
35—44 3:30 200 120 
45—54 700 500 100 
55—64 15:00 12:00 70 
65—74 40°00 35°00 30 
75 and above 120700 110700 


10 
(В.А. Hons. Econ. Delhi, 1973) 
Solution. CALCULATION OF STANDARDIZED DEATH RATES 


Age group Death Rate per 1000 Standardized 
(years) Country I Country П Population 
Xi Xs wW WX, WXa 
0—4 20'0 50 100 2,0000 5000 
5—14 ro 05 200 2000 100:0 
15—24 14 ro 190 2660 19070 
25—34 20 го 180 36070 18070 
35—44 33 20 120 3960 24070 
45—54 — TO 50 100 7000 500'0 
55—64 150 120 70 1,050'0 84070 
4 65—74 4070 350 30 12000 1,0500 
75 and above 12070 1100 10 1,2000 1,1000 
2W=1000 2WX,=7,372 ZWXs 
=4,700 
Standardized Death Rate of Country I: S.D. =n Tem E 
f У R-— Su =1000=7 372 
Standardized Death Rate of Country П: S.D.R, = 2s 4700 — E 
7, Xw -1000 47 
Illustration 17. Fill up the blanks in a portion of life table given below : 
Age in years l d p q L T e 
4 95.000 500 ? ? ? 
5 А 400 7 1 1 ео 5 


? 
(M. Com, Delhi, 1975) _ 
Solution: /5—/,—d4--95000—500—94500; p=- 5 —94500 5.0947 
а 


q471—p4—1—0:9947—00053 ; L,— Ils. 95000-94500. 94750 


T, 4850000 1-06. 
Т ~ 95000 >! 065 4e=!s—ds=94500—400=94100 


dele 294100. 5 c2 3 ы 
AT 045000 2958 2 45—1—079958—0:0042 


Ish 
zt ii a „94500494100 94300 ; 


Tg—T,—14—4850300—94750—4755550; е®„— -75 4155550. л 
EN l5 94500 
Thus after filling up the missing values, the completed table would look like this : 
Age in years 1 d Р q £ T e 
4 95,000 500 — 079947 00053 94750 4850300 5106 
5 94,500 400 09958 00042 94,300 4755550 5032 


y= 


Ls 


17 Interpretation of Data 


Statistics are convincing and this has led many people to believe that 
they can be accepted without question. However, this is a false notion as 
misuses are probably as common as valid uses of statistics. The figures pro- 
vide only raw material for someone to reason from—they seldom, if ever, 
speak for themselves, i.e., they have to be interpreted. The interpretation 
of data is a very difficult task and requires a high degree of skill, care, 
judgment and objectivity. In the absence of these, there is every likelihood 
of the data being misused to prove things that are not at all true. In fact, 
experience shows that the largest number of mistakes are committed 
consciously or unconsciously while interpreting statistical data and very 
often facts and figures are presented in such a manner that they are 
misinterpreted by most of the readers. 

It should be noted that the motive or intent of the misuser of 
statistics is not at issue : fallacies designed to mislead at those committed 
unintentionally will not be distinguished. The effect is the same in both. 
the cases—although, to be sure, in the first case there is absue of statistics 
and in the second case only a misuse. 

Statistical fallacies may arise in collection, presentation, analysis and 
interpretation of data. The following are some of the specific examples 
illustrating how statistics can be misinterpreted or how fallacies arise in 
using statistical data and statistical methods : 

1. Bias 

Bias, conscious or unconscious, is very common in statistical work and 
it leads to false conclusions. For example, if an investigator wants to 
prove that the level of wages in a factory is very low he may select the 
sample in such a manner as to exclude high-paid workers. Very often 
statistical information is twisted in such a manner as to grind one’s own 
axe. Thus, a labour union may claim that the cost of living index, on 
which wages are based, should be revised upward because it understates 
real costs, while an employers’ association may defend the index, pointing 
out components that overstate real costs. _ Similarly, businessmen may 
use statistics to prove the superiority of their business over others. For 
example examine the statement “The profits of firm A are Rs. 50,000 for 
the year 1973 whereas the profits of firm B are Rs. 80,000 for the same 
period and therefore firm B is better than 4.” On the face of it, it appears 
to beso. Buta little thinking shall reveal that many other variables have 
to be considered before drawing such a conclusion, such as the amount 
of capital invested in both the firms. lY 

Unconscious bias is even more insidious. Perhaps all statistical 
reports contain some unconscious bias, since the results of statistical work 
must be interpreted by human beings, each of whom can judge only in 
terms of his own experience and his attitude towards the problem at hand. 
The investigator must disregard his preconceptions and avoid wishful 
thinking in order to attain an objective conclusion. If biased data must 
be used in the absence of better information, the nature and probable 
amount of the bias must be considered in interpreting the results. 


*Spur and Smith : Business and Economic Statistics. 
SM-E—9°77-37 
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2. Inconsistency in Definitions 

Sometimes false conclusions are drawn because of failure to define 
properly the object being studied and hold that. definition constant in 
making comparisons. For example, while comparing the national income 
figures of two countries, say, India and the U.S.A., it is absolutely essential 
that the definition of national income is taken to be the same in both the 
countries. If the definition of national income is not the same, for 
example, one country includes the services of housewives in the calculation 
of national income whereas another country does not, the figures of 
national income cannot be compared. Furthermore, for facilitating com- 
parison over a period of time it is necessary to keep the definition of the 
object being measured constant or if a change in the definition is necessary 
at a future date to give a footnote at the end as to the likely impact on 
comparability. 


3. Faulty Generalizations 


A basic error that is very often committed in statistical work is to 
jump to conclusions or generalizations оп the basis of either too small a 
sample or a sample that is not representative of the population to which 
the conclusions are applied. For example, if by taking a sample of 10—15 
boys from a particular section we make a generalization that on the whole 
the students of that college are very intelligent, it would be just misleading. 
There are two reasons for this : (1) the sizeof the sample is too small, 
generally in a college there are about, 1,000—1,200 students and a sample 
of 10—15 hardly means anything ; and (2) the sample is not at all represen- 
tative because all the boys have been taken from the same section and it 
is just possible that this particular section may contain the best students. 


It may be pointed out that generalizations based on non-typical cases 
are more difficult to detect because such samples may be adequate in size. 
For instance, in the above example if a sample of 200 students is taken 
and the students are from section А of 1st, 2nd and 3rd year and the same 
conclusion that the boys on the whole are very intelligent is drawn, this 
would again be misleading. No doubt, here the size of the sample is quite 
adequate but the sample is not representative because only the boys of the 
best sections have been included in the sample. To take another 
illustration, examine the statement “80% of those who smoke develop 
indigestion Е" middle age. Hence, we may infer that smoking leads to 
indigestion." This type of generalization is not valid. unless we know the 
percentages of smokers and non-smokers in the population. For example, 
EDO Tone qr aa Uode ceni e petam соат 

; espectively. is means that out of 
ur smokers 8075 i.e., 8,000, develop indigestion in ear =e On 
му ү. m cannot infer that smoking leads to indigestion. Moreover, 
nen Uim study the causes of indigestion. The above examples make it 
M "ЕЕ деееп on the basis of inadequate and 
dp E sho be avoided in the interest of correct 


4. Faulty Deductions 
If we apply a general rule erroneously to a specific case, it would 


lead to faulty deduction. For exam i 
c 1 - ple, ifthe fit а А 
turing 10 different products have declined in 1973 d Ee ^s em it 
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should not lead one to cogclude that the firm is necessarily showing bad 
performance. It is just possible it may have improved out of 10 in 9 
products it is producing and has stopped the production of one product 
altogether. To take figures suppose the profits of a firm in 1976 have 
come down from Rs. I lakh in 1973 to Rs. 90 thousand. The firm stops 
the production of commodity ‘A’ which was fetching Rs. 22,000 per annum 
because the raw materials were not available. This means that on the 
whole the firm has improved. But most of the people by looking to the 
figures would draw a conclusion that the firm has done badly as compared 
to last year. 
5. Inappropriate Comparisons 

In order to draw conclusions from the data it is necessary to make 
comparisons. However, comparisons between two things cannot be made 
unless they are really alike. Unfortunately this point.is generally forgotten 
and comparisons are made between two dissimilar things, thereby leading 
to fallacious conclusions. For example, if we are comparing the wholesale 
price index of 1976 with that of 1960 we must see to it that number of 
commodities included, their qualities and the method of constructing the 
index is the same in both the years, otherwise the two indices cannot really 
be compared. Similarly, if the per capita income of India for 1972-73 is 
fourtimes as high as in 1952-53, on the basis of this we should not 
conclude that the people are four times better today than what they 
were in 1952-53. We mustalso study the behaviour of prices during 
this period. If there is no change in prices during this period, only then 
we can draw this conclusion, otherwise not. 


6. Misuse of Various Tools of Analysis like Mean, Median, Mode, 
Dispersion, Correlation, etc. 

The various tools of analysis are very often misused to present 
information in such a manner as to deceive the public. For example, a 
Public Ltd. Company having 1,005 shareholders may declare that the 
average holding of the shares of their shareholders is 100. But an ana- 
lysis of individual shareholdings may reveal that five persons who are 
Controlling the company have in all 99,000 shares. Thus the remaining 
1,000 shareholders have only 1,500 shares and this gives an average of 
1'5 only. Similarly range сап be used to exaggerate disparities in distri- 
bution of wages. For example, in a factory the wages of workers may 
range from Rs. 100 to Rs. 250 a month and the general manager of the fac- 
tory may be getting Rs. 2,500. A report that earnings of the employees 
of the factory range from Rs. 100 to 2,500 is deceptive. Correlation tech- 
nique may also be used in such a manner as to establish wrong conclu- 
sions. For example, one may observe a high degree of correlation 
between yield of rice per acre and rainfall in a certain area. On this basis, 
we should not conclude that the greater the rainfall the higher will be the 
yield, for the excessive rains may spoil the crops instead of increasing 
the yield. 

7. "Technical Errors 

Many types of technical errors are possible in statistical work 
which would have the effect of arriving at wrong conclusions from the 
data. For example, errors may be committed in the choice of a suitable 
formula, such as arithmetic mean may be used ina situation where har- 
monic mean is more appropriate. Similarly arithmetical errors may also 


кт4 INTERPRETATION OF DATA 


be committed while classifying the data or analysing the data. Errors 
in units of measurement are also common. A frequent error of this kind 
fs to confuse two kinds of logarithms, **natural" and “соттоп”, the for- 
mer being 2-3 times the latter. Another type of error that generally pre- 
vails in statistical work is in the use of ratios and percentages. While 
using these either a wrong base is used or 100 is not subtracted in figuring 
increases or the nature of comparison is misunderstood. А comparison 
of percentages without knowledge of the base to which they refer would 
lead to error and confusion. For example, let us take a case in which there 
are three industries dominating the economic life of a local community. One 
of them expects in 1977 employment 10% below normal, another 15% be- 
low normal, but the third expects employment 25% above normal. Should 
it be concluded from this that there would be no additional unemployment 
because the third industry would be in a position to absorb those displaced 
by the other two ? This type of comparison without knowledge of the 
base is misleading. We must know the number of workers employed 
in each of these industries. For example, if the number employed in the 
two industries is 10,000 and 5,000 the number of workers unemployed 
would be 1,750. Now suppose the third industry normally employs 
6,000 persons. An increase of 257; would mean 1,500 more jobs. This 
means that still 250 people would be unemployed. Hence knowledge of 
base is absolutely essential while comparing percentages. 

Quite often 100 is not subtracted in figuring increases and this leads 
to wrong conclusion. For example, the price of a commodity may have 
increased Rs. 50 in 1960 to Rs. 150 in 1976 and hence one may say that 
there is 300% increase in price. However, a little thinking would reveal 
that the actual increase is only 200%. 


8. Failure to Comprehend the Total Background of the Data 


Very often figures are interpreted without comprehending the total 
background of the data and itleads to wrong conclusions. For example, 
the mortality statistics may show that the deaths from T. B. have increas- 
ed from 50,000 in 1963-64 to 55,000 in 1976-77 and it may be concluded 
that the death rate from T. B. is increasing. However, for a proper 
interpretation of these figures it is necessary to comprehend the total 
background of the data in terms of the following considerations : 

(i) The collection and reporting of mortality statistics has improved 
over the period. 

(ii) Some deaths formerly listed as “causes un P 
correctly diagnosed as deaths from T.B. aa 


(iii) Other diseases causing death have been brought under control 
and consequently many more die of T. B. now who would have died of 
other diseases in earlier times. 


$ (iv) Due to increasing autopsy analysis of cause of death, there has 

ееп an increase in post-mortem diagnoses of Т.В. which show that the 
cause of death originally reported was not the true cause. 

Hence just on the basis of certain figures we should not jump to 


conclusions. We should try to int i i 
PP rer o Ty to interpret the figures in the light of the 


We have illustrated above some common ty isti i 
pes of statistical fallacies. 
The reader, however, should not feel that these are the only ways in 


are now 
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which statistics can be misused. In fact, this list is far from exhaustive 
and as one learns the basic principles of statistics he knows the crimes 
committed in its name. The awareness of the fact that statistics can be 
misused has led many people to raise slogans like “With statistics any- 
thing can be proved.” ; “There are three types of lies—lies, damn lies and 
statistics, wicked in order of their naming." However, it may be pointed 
out that neither statistics can prove or disprove anything nor do the 
“rather liars figure". In fact many people use statistics like a blind man 
who uses a lamp-post for support rather than for illumination. For every- 
one who is making use of statistics there is a lesson here and that is he 
should not be misled by bad statistics—one should not only avoid out- 
right falsehood but must be on the alert to detect possible distortion of 
truth, 
The science of statistics should not be condemned because it can 
be abused and misused, for the fault lies not with statistics as such but 
with the users of the subject. An interesting example can be given to 
illustrate our point. Ifa child cuts his finger with a sharp knife, it is not 
the knife that is to blame. Similarly, if a person takes a wrong medicine 
or excessive dose of a medicine and dies, we cannot blame the medicine as 
such, Statistics are very much like clay of which one can make a God or De- 
vil as one pleases. Roberts and Wallis have rightly said : “Не who accepts 
statistics indiscriminately will often be duped unnecessarily. But he who 
distrusts statistics indiscriminately will often be ignorant unnecessarily.” 


SUGGESTED READING 


Wallis and Roberts : Statistics—A New Approach, 
Spurr and Smith : Business and Economic Statistics. 
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Man 


Section 


1 Statistical System in India 


e 


Need 

The existence of a sound statistical system for collecting, com- 
piling and analysing data on various aspects of an economy isa precon- 
dition for the availability of adequate, accurate and timely statistics 
that аге so essential їп framing any suitable- policy or measuring its 
effects. The type of statistical organisation needed in any country is 
largely determined by the requirements of the various types of statistics, 
which in turn depend upon the level of the economic growth and the 
importance of the role of the State in the economic system of the 
country. Statistics on various aspects may be collected either by the 
State or private bodies. However, the capácity of non-governmental 
organisations to collect statistical data is generally limited and they 
obtain information only for fulfilling certain objects which in most cases 
are not apt for the economic analysis of problems on a national scale. 
Moreover, the data collected by private bodies are not comparable due 
to various conceptual and other differences present in them. Hence, 
for all these reasons in almost all countries the State is the most 
important single agency collecting statistics, and the statistical organisa- 
tion is mostly an integral part of the Government machinery. 


Historical Development 

The statistical organisation as it exists in India today has not 
grown overnight. The history of statistical development in India dates 
back to ancient times but in the relatively modern history which dates 
from the cighteen-sixties, two periods may be distinguished, one prior 
to independence and another after independence. 


Before Independence. In the British regime statistics were 
largely collected as a by-product of the administrative activity and as 
such statistical machinery was only rudimentary. No proper statistical 
organisation was built up in the country during this period. The first 
significant development in statistics during pre-independence period 
was the setting up of the statistical committee in 1862 with a view to 
prepare mod-l statistical forms for compilation and collection of statis- 
tics in the fields of trade, finance, education, agriculture, etc., which 
later led to the publication of Statistical Abstract for British India: in 
1868. The first Gazeite of India containing economic statistics for 
British provinces was started in 1866. The first complete population 
census was conducted in 1881 on a uniform basis throughout the 
country. On the recommendations of the Indian Famine Commission, 
Agriculture Departments were opened in 1881 in various provinces 
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inter alia for collection of agricultural statistics which led to the publica- 
tion of Agricultural Statistics of British India in 1886. In order to 
Scrutinise and collate the enormous mass of agricultural statistics 
which started flowing in from Provincial Agricultural Departments, a 
full-fledged statistical bureau was formed at the Centre in 1895 to co- 
ordinate both. agricultural and foreign trade statistics and to collect 
statistics relating to prices, wages and industries. This Bureau was 
headed by a Director General of Statistics. In 1905 the Director General 
of Statistics was replaced by the Director General of Commercial In- 
telligence and Statistics, with the additional function of bringing about 
liaison with the trading community. In 1925, the Economic Enquiry 
Committee was set up to enquire into the question of adequacy of the 
statistical data available and the desirability and possibility of supple- 
menting it and of undertaking an economic enquiry. The Committee 
concluded that if statistics in India were to be maintained on a 
satisfactory basis, all work relating to it should be coordinated and 
centralised in one department. The Committee also recommended the 
conduct of quinquennial wage census in large industries and collection 
of statistics of output and raw materials consumed in cottage industries. 
The Government, however, did not find their way in giving effect to 
theserecommendations. In 1934 another. important committee known 
as Bowley Robertson Committee was appointed. Аз a result of recom- 
mendations of this committee, the Government of India decided to set 
up the central statistics organisation. But this decision could not be 
implemented due to practical and financial difficulties. The outbreak 
of war in 1939 gave a fillip to the development of statistics to meet the 
requirements of the Government arising from new responsibilities 
assumed by it in both military and civil spheres of activity. Accordingly, 
the Government of India set up an Inter-Deparimental Committee with 
the Economic Adviser to the Government of India as Chairman to 
consider the statistical material available and make recommendations 
regarding filling up gaps and for improvement in the existing organisa- 
tions. The Committee recommended the fozmation of a central statis- 
tical office for co-ordination, establishment of statistical- bureaus at 
State Headquarters and preparation of over-all statistics for the entire 
country. ‘Thus we find that a proper statistical organisation in the 
country was not built up though many committees gave thoughtful 
consideration to this problem and made valuable recommendations. 


After Independence. After independence, in 1949 Prof. P.C 
Mı halanobis, F.R.S., was appointed as Honorary Statistical Adviser to 
the Cabinet. He brought to the fore the urgent necd for a proper 
Statistical organisation and the organisa ion began to take shape. To 
start with, a nucleus statistical unit was set up at the Centre in the 
Cabinet Secretariat in 1949 which developed into the Central Statistical 
О rganisation in 1951, with standardization, co-ordination and advisory 
functions. Subsequently in 1961, a full-fledged Department of Statistics 
was formed in the Cabinet Secretariat and the Central Statistical 
Organisation became a part of it; along with the National Sample 
Survey Directorate. In'1966, a computer centre was set up in the 
Department of Statistics to cater to the data Processing necds of the 
Gv verament Departments located in and around Delhi. Since 1973 
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the Department of Statistics is functioning under the Ministry of Plan- 
ning. 

The new responsibilities for wider social and economic functions 
of the Government after attainment of independence led to a further 
demand for statistics and gave great impetus for development of statis- 
tics with the formulation of Five-Year Plans for the country, the need 
for new types of statistics for judging the progress of the Plan schemes, 
overall assessment of the Plans and for evaluation surveys was felt and 
suitable orientation of the existing statistical system both at the Centre 
and States was attempted. An additional stimulus was provided by 
the growing statistical requirements of international organisation like 
the United Nations and its specialised: agencies and their attempts to 
promote suitable statistical standards for international comparability. 


Nature and Structure of the Indian Statistical Organisation 


The nature and structure of the statistical organisation in our 
country is necessarily governed. by the constitutional set-up. India 
being a federation of States, there is a dichotomy of responsibility for 
Government between the Central Government on the one hand and 
State Governments on the other. Under the Indian Constitution, the 
responsibility is divided on the basis ofa three-fold classification of all 
subjects, i.e Union List, State List and Concurrent List. In the 
Union List are certain items over which the Central Government has 
the exclusive control. Some of these are Defence, Railways, Posts and 
Telegraphs, Currency and Foreign Exchange, Trade and Commerce 
with Foreign Countries, Census, Income-tax, Customs and Excise 
Duties, etc., and also enquiries, surveys and statistics for ‘the purpose 
of any of the matters in the list. In the State List, the exclusive 
jurisdiction of the State Governments extends to items of Public 
Health, Agriculture, Livestock, Irrigation, Forests and Fisheries, etc. 
The important items in the Concurrent List where both. the Union and 
State (Governments can operate are Industry, Vital Statistics, Social 
Insurance, etc. Others like agriculture and education аге assigned to 
States, although enquiries and statistics relating to. these items also 


figure in the concurrent list. 


Most of the Ministries at the Centre have either a full-fledged 
statistical department, a division or a section depending upon their 
needs and upon the stage of development of statistics in the relevant 
fields. At the State level, various State Government Departments, 
with responsibility for different subjects, contain statistical units, big 
or small, concerned with collection and compilation of statistics relating 
to the particular field. 


Although the jurisdiction of the Central and State Go-ernments 
has been clearly defined by the Constitution, yet in practice no rigid 
demarcation has been possible. Even in cases where States have the 
primary responsibility for the subject fields, the Centre acts as the 
co-ordinating authority for the presentation of data on an ali-India 
basis. The Department of Statistics (of which the Central Statistical , 
Organisation is the technical wing), Ministry of Planning, is charged 
with the important function of co-ordination of all statistics at State 
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and Central leveis. "The State Statistical Bureaus, attached to different 
departments in different State Governments, are charged with the 
responsibility of co-ordination of all statistics at State level and keeping 
liaison with the Central Statistical Organisation for purposes of co- 
ordination at all-India level. 


І. Statistical Organisation at the Centre 


There are at present 121 statistical offices/units in the Central 
Government with a total of 9,076 statistical personnel.* 


At the Centre, most of the Ministries collect or use statistics in 
some manner or other and have their own statistical units. They are 
of different sizes, are in varying stages of development and are charged 
with distinctive functions. Some of them located in the administrative 
departments are engaged in the processing of data which are purely 
by-products of administration. Examples of such agencies are: offices 
of Income-Tax Department. Central Board of Revenue, Railways, 
Posts and Telegraphs and the Directorate General of Supplies and 

isposals, There are again some statistical units in the organisations, 
set up for control of production and distribution of products in short 
supply, and these maintain Statistics which are of value alike to the 
Government organisations and public. Examples of these are : Textile 
Commissioner’s Office, Central Excise Commissioner’s Office, Iron and 
Steel Controller’s Office. “There are also various organisations 
established by the Government specifically for the purpose of collecting, 
compiling and co-ordinating statistical data. These are : 
‚ (1) Office of the Registrar General and Census Commissioner of 
India, New Delhi. 


. (2) Directorate General of Commercial Intelligence and Statistics 
—Ministry of Commerce, 


(3) Labour Bureau—Ministry of Labour. 


(4) Directorate of Economics and Statistics — Ministry of Agricul- 
ture and Irrigation. 


Dene The Indian Army Statistical Organisation—Ministry of 


,. (6) National Sample Survey Or anisation—Depart is- 
tics, Ministry of Planning. н шаа eaae 


(7) Central Statistical Or anisation— Department isti 
МЫШ С Баир g partment of Statistics, 


"Statistical System in India—C.S.0, (1970 Ed.). 
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it as soon as the census operations were over and the reports were 
published. In February 1960, the work of collection, compilation, 
publication and improvement of vital statistics was transferred from 
the Ministry of Health to the Office of the Registrar General, India. 


. . Besides bringing out the census reports. this office brings out : 
(i). Vital Statistics of India (annual), and (it) Indian Population Bulletin 
(biennial). 

2. Directorate General of Commercial Intelligence and 
Statistics, Ministry of Commerce. This Directorate was established 
in the year 1895. Its main functions are as follows: 

(1) To collect and furnish commercial information required by 
the Government and the trade. 

(2) To mediate in commercial disputes between Indian and 
foreign businessmen with a view to bringing about amicable settlements. 


(3) To grant trade introductions. 

(4) To maintain a register of firms in India and to enter therein 
relevant information relating to firms. 

(s) To maintain a commercial library and reading room for the 
use of public. 


(6) To extend assistance to commercial concerns with a view to 
stimulating foreign trade of India particularly the export of Indian 
produce and manufactures. И 

(7) To disseminate commercial information received from the 
Indian Government Trade Representatives abroad. 

(8) To publish the weekly Indian Trade Journal. 

(9) To assist persons engaged in trade and industry. 

(10) To compile and publish statistics of trade, shipping, etc., in 
the publications issued by the department. 


3. Labour Bureau, Ministry of Labour. The Labour Bureau 
was set upin the year 1946. It is the most important institution for 
collecting, compiling and publishing statistics on various aspects of 
labour. The main functions of the Labour Bureau are : 

(1) Collection, compilation and publication of labour statistics. 


(2) Construction and maintenance of consumer price index 
number. Е = ib ; 
(3) Conducting research into specific problems with a view to 
furnishing data required for the formulation of labour policy. 


ingi i ts of 
(4) Bringing out pamphlets and brochures on various аѕрес 
labour AE I and other statistical work relating to labour affairs. 


4. Directorate of Economics and Statistics, Ministry of Agri- 
culture and Irrigation. The Directorate of Economics and Statistics 
was set up in 1948 under the Ministry of Food and Agriculture now called 
Agriculture and Irrgiation in pursuance of the decision of the Govern- 
ment of India to centralize all services 1n the field of agricultural econo- 
mics and statistics. The main function of the Directorate is to advise the 
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Ministry of Food, Agriculture, Community Development and Co- 
operation on current issues of agro-economic policies arising out of its 
day-to-day work. For this purpose, the Directorate undertakes to 


the Five-Year Plans and the continuous assessment of their progress, 
undertaking special commodity studies, keeping a constant watch on 
the behaviour of agricultural Prices, obtaining and analysing market 
information and offering suggestions regarding corrective measures, 
whenever necessary, study of land economics including questions rela- 
ting to land policy, rural development and agricultural labour, syste- 
matic investigation and study of economic and social problems arising 
in special fields, etc. The Directorate also functions as the nucleus 
for research studies on adverse aspects of agricultural economics 


either undertaken independently or in collaboration with other orga- 
nisations. 


^5. Army Statistical Organisation, Ministry of Defence. The 
Army Statistical Organisation (ASO) was set up in 1947. It performs 
the following functions : 
(1) Maintenance of basic Statistical records and the regular com- 
pilation and supply of data regarding personnel, vehicles, armament 
and equipment, animals and accommodation, etc. 


(2) Control of reports and returns emanating from Army and 
Command Headquarters. 


(3) Technical advice on statistics in the army. 


(4) Design, conduct and analysis: of Sample surveys, experiment 
and investigation. 


The Army Statistical Organisation has the largest mechanical 
tabulation equipment for the tabulation of data.“ There is also а 


research unit connected with application of sampling methods to 
Surveys, etc. 


ы; Майопа1 Sample Survey Organisation, Ministry of Plan- 
ning. Another Important agency at the Centrevis t 
Survey Organisation. The difficulties in the way о 
tion of data on a comprehensive basis, 


economy on a continuing basis, as required by the 
Committee, Planning Commission and other Ministrie 


957 the organisation was transferred to the 


The NSS organisation has been conducting multi-purpose socio- 
economic sample surveys in the form of rounds. In 1975, it entered in 
its 3oth round. Up to the 13th round the field enquiries were of varying 
period ranging from 3 to8 months. From 14th round onwards, the 
survey period has been made of 12 months’ duration. Thus each round 
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of the NSSO now coincides with the agricultural year. The programme 
for each round is decided by the Department of Statistics in collabora- 
tion with concerned Ministries and State Governments. 


The main function of the National Sample Survey Organisation 
аге: 

„(1) Collection of comprehensive data, ona continuing basis, 
relating {о socio-economic and'demographic conditions, prices, area 
and yield of different crops, ete., on a country-wide basis, with a view 
to filling the gaps in the statistical information required. for national 
income estimation, planning and for policy and administrative deci- 
sions by the different ministries of the Government of India, 

_ .(2) Conduct of annual survey and related enquiries in the orga- 
nised industrial sector. 


(3) Co-ordination of crop estimation surveys on principal crops 
in different States and connected methodological and type studies, 
technical participation and guidance for the ad hoc surveys. 

(4) Programming for supervision of crop cutting experiments 
and participation in the training imparted to the primary field staff and 
tabulation of data obtained through these experiments, 


7. The Central Statistical Organisation, Department of 
Statistics. With the growing demand: for statistics the need was felt 
for an organisation at the Centre which could co-ordinate the work of 
the Centre and th States in respect of collected data. Consequently, 
the Central Statistical Organisation (CSO) was set up in May, 1951 
in the Cabinet Secretariat with co-ordinating and advisory functions, 
i.e. co-ordination of statistical activities at the Centre and the States, 
advisory work relating to statistical matters, provision of national data 
to the United Nations and other international statistical organisations’ 
regular publications and: а graphical presentation of economic and 
socialstatistics. However, th» functions of the CSO have expanded 
much by the transfer of the National Income Unit from the Ministry 
of Finance to the CSO in 1954, transfer of the Directorate of Industrial 
Statistics from the Ministry of Commerce and Industry in 1957, setting 
up a separate unit to look after statistical work relating to planning in 
collaboration with the Planning Commission, expansion of the training 
facilities for statistical personnel, and so on. In 1961, the status of the 
Central Statistical Organisation was rais:d to that of a department, 
with the creation of a Department of Statistics inside the Cabinet 
Secretariat, comprising the Directorate of National Sample Survey’ also 
as its subordinate office. In 1966 a computer centre was also establi- 
shed under the Department of Statistics. 


Besides the Industrial Statistics Wing at_Calcutta, the CSO has 
the following divisions at its headquarters in Delhi : 


(a) Industry and Trade Division, including small industries unit. 
(b) Manpower Research Division. 

(c) Methodology Division. 

(d) National Income Division. T, 

(e) National Sample Survey Organisation. 

(f) Planning and State Statistics Division. 
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(g) Population Division. 

(h) Prices and Cost of Living Statistics Division. 
(i) Statistical Intelligence Division. 

(j) Training Division. 

The main functions of the CSO are : 


(1) Co-ordination of statistical activities including the work re- 
lating to National Sample Survey and other sample surveys. 


(2) To maintain liaison between the Central Ministries and State 
Governments, to organise statistical conferences and committees and 
render secretarial assistance. 


(s) To tender advice to Central Ministries and other Government 
organisations on statistical matters. 

(4) To keep liaison with the United Nations Statistical Commis- 
sion and other international agencies. 

(5) To undertake research in development of standards and me- 
thodological studies. 


: (6) Co-ordination and dissemination of statistical intelligence, 
including graphical presentation. 


(7) To undertake statistical work relating to planning. 
(8) To organise and conduct training courses in official statistics. 


j (9) To compile national income estimates and undertake studies 
in the field of national accounts. 


(10) To plan and co-ordinate the conduct of Annual Survey of 
Industries and to tabulate and publish the results thereof. 


| (11) Construction of prices and cost of living indices for middle 
classes. 


(12) To prepare estimates of the population every year. The 
census of population in India takes place only after ten years. The 
estimates of population are needed year-wise to frame suitable policics 
in the different spheres. "These estimates are prepared by CSO. 


(13) To take up any work pertaining to collection of statistics on 
the request of any State Government or Central Government. 


Publications of the C.S.O. 


The C.S.O. brings out а large number of publications. Some of 
these are : 


1. Monthly Abstract of Statistics. The monthly Abstract of 
Statistics presents key statistical data pertaining to various faces of 
the Indian economy. The data is classified under r5 heads such as (i) 
Population ; (ii) National Product ; (iii) Employment ; (iv) Industrial 
Production ; (v) Transport ; (vi) Foreign Trade ; (vii) Consumption and 
Stocks ; (viii) Banking and Currency; (ix) Joint Stock Companies; 
(X) Finance ; etc. г 


_ 2. Monthly Statistics of the production of selected industries of India. 
With „a view to economise in printing and paper the publication is 
being issued at an interval of two months as a combined issue for January 
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and Febuary, 1966. It gives monthly production statistics relating to 
most of the major industries covering 412 items for all India and 79 
items by States. It also contains information about index of industrial 
production, installed capacity and mill stock. Statewise production 
figures of selected industries are also presented. 

3. Statistical Pocket Book of the Indian Union. This is a pocket-size 
annual publication and presents in more than 140 tables a concise 
factual account of social and economic trends in India based on 
principal statistical series currently available. The latest issue relates 
to the year 1976. 


4. Statistical Abstract. This is an annual publication. It presents 
in 257 tables comprehensive and authoritative combination of national 
statistics relating to area and population, climate, agriculture, mining, 
industries, banks, joint stock companies, motor vehicles, balance of 
payments, etc. АП India totals are also provided. The latest issue 
relates to the year 1976. 


s, Annual Survey of Industries. This publication appeared under 
the name of ‘Census of Indian Manufactures’ up to the year 1958 and 
has thereafter been renamed as above from 1959. The former annual 
census up to 1958 covered only 29 major industries in the factory sector 
(power using establishment employing 20 or more workers on any day). 
The Annual Survey of Industries brings within its scope all factories in 
the registered sector, i.e,, those employing то or more workers and 
using power or 20 or more workers and not using power. The factories 
employ.ng 50 or more workers using power and roo or more workers 
not using power classified under some 242 industry groups are being 
covered in the above report. 


Besides these regular publications a number of ad hoc publications 
are also brought out by C.S.O. Some of these are : 

(1) Brochure on Revised Series of National Product for 1960-61 to 
1964-65. 

(2) Estimates of Capital Formation in India : 1960-61 to 1965-66 . 

(3) Estimates of Savings in India : 1960-61 to 1965-66. 

(4) Official Statistics—Sources of data and major gap. $— 1968. 


(5) Statistcal System in India. 

(6) Monthly production of Selected Industries of India. This 
bulletin presents the monthly production data for 312 items covered in 
the index of industrial production and indices for the latest 3 months. 


(7) Sample Surveys of Current Interest in India (Annual). It 
contains information about the sample surveys conducted. by the 
various offices of the Central and State Governments, Universities and 


Research Institutions in the country. 


П. Statistical Organisations in the States 

Statistical organisations in the States are of more recent origin than 
their counterparts at the Centre. As former British India ое 
they were collecting certain statistics for the Centre on what were ca led 
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standarised imperial tables, on subjects like agriculture, education, vital 
statistics, excise duty, and so on. Before independence there were in 
India, in addition to the provinces, a very large number of princely 
States scatteredall over the country and in different stages of social 
and economic development. These princely States, with a few 
exceptions, had hardly any organisations which ptoduced data of any 
kind beyond what was barely necessary for collection of revenue. 
After independence these States were merged with the former 
Provinces, now known as the States of the Indian Union. 


Since the war years; and particularly in the wake of recommenda- 
tions of the Gregory Committee of 1946, State Statistical Bureaus have 
been set up in all the States either as independent statistical 
organisations or as partof the combined economic and statistical set-up. 
The conference of Central and State statisticians held annually since 
195t known as the Central Technical Advisory Council on Statistics 
gave a further fillip to the setting up and strengthening of such organi- 
sations, Since 1964 all States and Union Territories are having either a 
Directorate of Economics and Statistics or a Department of Statistics 
for looking to all types of statistical work. 

The important functions of State Statistical Bureaus аге: 


(i) Co-ordination af statistics collected by different departments. 


(ii) Publication of a Statistical Abstract collecting. all essential 
serial statistics. 


(iii) Organising special enquiries and surveys. 


(iv) Laision between statistical organsiations at the Centre and 
other States. 


(v) Statistical work relating to planning. 


There are, however, considerable differences between the different 
State Statistical Bureaus in the areas of their reponsibility for collection 
of statistics. Thus, while in some States statistics are almost centralized 
in the Bureau, in most other States the collection of agriculture, labour 
and vital statistics generally fall outside the scope of work of the State 
Statistical Bureaus, Some of these Bureaus as those. in West Bengal, 
U.P. and Bombay have been conducting’ a number of socio-economic 
enquiries for'collection of data required for formulation of policy in the 
States, from time to time. In recent years most of the State Statistical 
Bureaus have joined the collaboration programme of the NSSO for 
Conducting multipurpose surveys on а continuing basis. This has 
enlarged the scope of the work of the statistical Bureaus considerably. 
Most of the States are now having mechanical tabulation units for 
processing NSSO data. Some of the Bureaus аге conducting 
Soclo-€conomic enquiries on subjects not covered in the NSSO on their 
Own to meet their specific requirements. Recently, the Bureaus have 
been conducting annual Sample census on population, births and deaths 
at the instance of the Registrar General, India. The Burcaus have also 
been assigned the work of computing State incomes. 


An important devel 


Tta! opment which took place in recent times in 
theState statistical sys E 


tem is the programme of Central assistance 


STATIS TICAL SYSTEM IN INDIA Iur 
extended to States under the Second and Third Five-Year Plans for ex- 
pansion of the statistical organisations and their activities in the States 
Strengthening of Statistical Bureaus for planning needs, setting up of 
District Statistical Offices, setting up of Administrative Intelligence 
Units for Compilation of Community Development/Nationa! Extension 
Service Statistics, training of statistical personnel are some of the State 
Statistical Schemes for which Central assistant was extended. Anoher 
noteworthy development of the statistical system іп the States has 
been tbe appointment of a Statistical Assistant. designated differently 
in the different States as Progress Assistant, Statistical Officer 
(Extension), in each of the development blocks. Through this agency it 
has been possible to co-ordinate statistics of all types emanating at the 
block level. For supervising statisticai work in the development 
blocks and also for co-ordinating the activities at the district level 

District Statistical offices were also set up on a phased basis. A 


III. Non-Government and Semi-Government Statistical 
Organisations 


Non-Government statistical organisations include the statistical- 
cum-economic agencies outside the Government constituted under 
statutory provisions such as : 


(1) Indian Statistical Institute. 

(2) Institute of Agricultural Research Statistics. 

(3) Statistics Department of the Reserve Bank of India. 
(4) Economic Department of the Reserve Bank of India. 


(s) The National Council of Applied Economic Research, New 
Delhi. 


(6) The Institute of Applied Manpower Research, New Delhi. 

(7) The Institute of Labour Research, Bombay. 

(8) The Institute of Economic Growth, Delhi. 

(9) The Gokhale Institute of Politics and Economics, Poona. 

(10) The Institute of Foreign Trade, New Delhi. 

A brief description of seme of these is given below: 

1. The Indian Statistical Institute. An account of the 
Statistical organisation in India would be incomplete without a 
mention of the Indian Statistical Institute which is a professional 
(non-official) organisation. It was established in 1932. It has helped in 
developing the statistical system in India in three different ways : 

(i) As a learned society, 

(i) As a centre of research and training, and 

(iii) As an agency for conducting large-scale statistical projects. 

Since 1938 the Institute has been holding examinations for 
the award of certificates and diplomas of proficiency in statistics, It 
is in charge of the technical work relating to the National Sample 


Survey. The Indian Statistical Institute (jointly with the Inter- 
national Statistical Education Institute and UNESCO) also runs an 
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International Statistical Education Centre at Calcutta. At this centre, 
a nine-month training programme for students from Asian countries is 
being run. In accordance with a decision of the Government of India 
in 1952, the Institute is now functioning аз а focal centre for profes- 
sional training and research and as the national statistical and computa- 
tional laboratory in India. In recognition of its work the Indian Parlia- 
ment, through the enactment of the Indian Statistical Institute Act in 
December 1959, declared it as an institute of national importance and 
empowered it to confer degrees in Statistics. Accordingly, the Institute 
has since 1960 started courses of study for the degree of Bachelor of 
Statistics (B. Stat.) and Master of Statistics (M. Stat.). The two higher 
degrees of Ph.D. and D.Sc. have also been introduced. Asa centre 
for professional training in Statistics, the Institute has expanded and 
reorganised its training activities in recent years in order to meet the 
growing need of statisticians. The Institute organises several training 
courses such as 6-9 months officers’ training course in Statistics in 
collaboration with CSO and technical training for computer, field 
investigators and operators for machine tabulation. The Institute has 
also established branches in different parts of India. It is also bringing 
out a monthly journal entitled ‘Sankhya’. 


Up to 1970 the Indian Statistical Institute, since the setting up 
of the National Sample Survey was in charge of all its technical work 
which includes design of the survey, preparation of the schedules, 
tabulation of data and report-writing. In 1970 a separate independent 
organisation known as NSSO was set up to do all this work. 


2. Institute of Agricultural Research Statistics The Institute of 
Agricultural Research Statistics (IARS), formerly statistical branch of the 
Indian Council of Agricultural Research, was set up in accordance with 
the recommendations of the Royal Commission on Agriculture in 1928. 
It has done pioneering work, among other things, in introducing the 
method of random sampling for the estimation of yield of crops and 
evolving suitable techniques for experimentation in cultivators’ fields. 
The IARS has developed into a leading centre for research and training 
in the above subjects and conducts post-graduate courses for the award 
of certificates and diplomas in agricultural research statistics. 


_ 3, Statistics Department of the Reserve Bank of India. The 
Statistics Department of the Reserve Bank of India in Bombay is carrying 
out an excellent research work. The Division of Monetary Research, in 
collaboration with other Divisions, produces the Annual Report of the 
Bank, Currency and Finance and various monthly and quarterly reports 
for the use of the'Bank, Government Departments and the public. The 
Division attend also to specialized research work relating to stock 
exchanges, bullion markets, public finance and banking problems. The 
Statistics Department of the Reserve Bank brings out a monthly 
bulletin called Reserve Bank of India Bulletin containing a large amount 
of statistical material in the form of tables on various aspects. 


The above description clearly shows that it is only after indepen- 
dence that we have a statistical organisation worth the name in our 
country. The statistical system in our country. has gradually been 
decentralised. Formerly, we had a highly centralised system for the 
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collection of statistics and the Department of Commercial Intelligence 
and Statistics was the pivot round which the wheel of the statistical 
organisation revolved. Allimportant statistics were collected, compiled 
and published by this Department. Today this Department has lost its 
monopoly. As far as the Centre is concerned each Ministry has a statis- 
tical unit (some have more than one) which is responsible for collection 
and publication of statistics relating to the subject of the ministry. 


Almost all the States in the country have either a Directorate of 
Economics and Statistics or Bureau of Statistics. The State statistical 
organisation is reponsible for the collection and publication of statistics 
relating to the State concerned. Each State has district statistical 
officers for exercising supervision and processing of data collected from 
various sources. The Directorate or the Bureau in each State 
co-ordinates the statistics collected by the other units in the State. 
The problem of inter-state co-ordination is solved by the Central Statis- 
tical Organisation. 


It should be noted that the Government is trying its level 
best to improve the quality of data available and also to enlarge the 
scope. A special scheme for augmentation of the mechanical tabula- 
tion facilities was included in the Fourth Plan of all the States. 
Another significant development has been the setting up of an electronic 
data processing centre in the Department of Statistics. Ten electronic 
computers (Honeywell—4oo) are being installed in different parts of 
the country to remove the bottlenecks in tabulation of data hitherto 
experienced and also to take care of increased tabulation and processing 
requirements during the Fourth Plan. Realising the role which statis- 
tics will have to play in our developing economy the Government of 
India recently constituted acadre of professional statisticians known 

„as the Indian Statistical Service (ISS). А staff college for the training 
of statistical personnel has been set up recently. 


TRY YOURSELF 
I. What is meant by a “National Statistical System™’? State the principa[ 


functions of such a system. Outline the main features of the present statistical 
System in India indicating the principal agencies concerned with the collection and 


Publication of statistical information. (B. A. Hons. Econ., Delhi, r968) 
2. Discuss in detail the organisation and functions of the Central Statistical 
Organisation. (B. Com. Lucknow 1969) 


3. Make a critical review of the present statistical organisation in India. 
(B. Com: Punjab, 1967) 


2 4. Describe the statistical organisation of India and point out its deficiencies 
in the context of a developing economy. (B. Com. Punjab 1969) 


r s. Explain briefly the functions performed by the following Statistical Units 
in India : 
(a) Directorate of Economics and Statistics. 
(b) Department of Commercial Intelligence and Statistics. 
(c) Department of Research and Statistics (Reserve Bank of India). 3 
(B. Com., Punjab, 1972) 


. 6, Writea note on the evaluation of the present statistical organisation of 
India and point out its main shortcomings. (B. Com. Gorakhpur 1973) 
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т. , Write a note on the main functions of the CSO in India. Do you think 
the organisation is serving some useful purpose in respect of co-ordination of data 
emanating from various agencies ? (M. Com. Gorakhpur, 1968) 


8. Discuss the functions of the Central Statistical Organisation. Can it be 
assigned the task of co-ordinating the work of the various official agencies collecting 


statistics in India ? (B. Com. B.H.U., тобо) 
9, Give an account of the organisation known as Central Statistical Organi- 
» sation of India and the work done by it. (B.A. Bombay, 1970) 


10. Discuss the organisation and functions of the CSO in India, What 
suggestions would you offer to make it more useful and effective ? 
(B. Com. Kurukshetra, 1975) 
тг. Write a critical essay on the Statistical System in India. 
(M. Com. Punjab, 1975) 
12. Write a note on the ‘Present Statistical Organisation of India pointing 
out its weak points, and suggest ways of improvement, (M.A. Econ. Jabalpur, 1975) 
13. Trace briefly the development of C.S.O. since its inception. Mention 
at least 3 of its important divisions, at least 3 of its important functions and at least 
3 of its publications. 
(B. Com. Bombay, 1976) 


Section 


2 Agricultural Statistics 


The term *Agricultural Statistics' has a very wide connotation and 
may be said to include allstatistics having a bearing on the different 
fields of agricultural economy such as statistics of land utilisation, 
production of crops, livestock,and animal husbandry, forestry, fishery, 
mines and minerals, poultry farming, etc. Obviously, such statistics 
are indispensable in framing suitable policies in respect of agriculture 
and also in measuring the efficacy of these policies. In fact, in a country 
like India whose economy is predominantly agricultural in character, 
the need for accurate and timely agricultural statistics is obvious. 


Scope of Agricultural Statistics 
, The Food and Agricultural Organisation (FAO) of the United 
Nations has given the following classification of agricultural statistics : 

l. Basic agricultural statistics pertain to statistics of landhold- : 
ings and their characteristics. The latter include the size of holdings, 
form of tenure. fragmentation, land utilisation, employment, mechani- 
sation, etc. The data under this category throw light on the resources 
and structure of agriculture in India. 

2. Agricultural Statistics proper comprising statistics of area and 
yield and of livestock and their products. 

3. Agricultural statistics in the wider sense refer to statistics of 
agricultural stocks, trade prices and consumption including that of 
livestock. The coverage of statistics belonging to this category extends 
to the cost of living of farmers, loans to them, taxes and other levies on 
farms, labour employed and data pertaining to forests and fisheries. 

The FAO has laid down the following requirements which they 
consider as basic and essential for all agricultural statistics to satisfy : 

(i) Utility 

(ii) Significance 

(iii) Reliability 

(iv) Adequate coverage 

(v) Timeliness 
(vi) International comparability. 


Historical Development 
ning to agriculture have been collected in our country 
from very early times. Unlike other countries of the world, in. India 
the responsibility of collecting these statistics has remained with the 
Government. The land revenue has been the principal source of revenue 
for the States and as;such in ancient days rulers collected such statistics 


Statistics pertai 
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and maintained a complete record of land acreage and production so as 
to determine land revenue. Kautilya wrote in his Arthashastra that 
land records were elaborate and complete even in the days of the 
Ramayana. According to him, during the reign of Raja Ram Chandra 
land revenue was based on agricultural produce. During the reign of 
Akbar, a Ministry of Agriculture was established under the charge of 
Raja Todarmal The Ministry maintained detailed records of the 
classification of land area under cultivation of cereals, orchards, fallows, 
forests and land under State occupation. Thus we find that agricultural 
statistics in the past have largely been a by-product of land revenue 
administration. 


After Independence, the Government has made a sincere effort to 
improve the quality and coverage of agricultural statistics so that they 
are used not only for raising revenue but also for framing suitable 
policies in respect of agriculture. The States are primarily responsible 
for collecting agricultural statistics. At the Centre there is Directorate 
of Economics and Statistics under the Ministry of Agriculture and 
Irrigation which co-ordinates these statistics and publishes them on an 
all. India basis. 


In this section the agricultural statistics of India have been 
discussed under the following heads : 


І. Statistics of Land Utilisation. 
II. Statistics of Crop Output. 


III. Miscellaneous Agricultural Statistics. 
1. Livestock and Poultry Statistics. 
2. Statistics of Forestry. 
4. Statistics of Fishery. 


IV. Indices of Agricultural Production. 
I. Statistics of Land Utilisation 


As the term suggests, land utilisation statistics refer to those 
statistics whicb relate to the utilisation of the entire tract of land which 
comes within the geographical limits of a country. Ordinarily, it is not 
possible to utili e every inch of land that a country possesses as some 
part of it may be taken up by forests, some areas may not be easily 
accessible and there might be considerable difficulties їп the utilisation 
of others. To have a full picture of land utilisation it is necessary to 
collect a large variety of statistics relating to total area, area under 
forests, area not available for cultivation, area of other uncultivated 
land and current fallows, net area sown, area devoted to different crops 
and the area and crops irrigated and unirrigated. Р 

Statistics of land utilisation are available in India si 
1884. However, over a period of time the voee pr лыга And 
scope has been gradually expanding from year to year which render 
them uncomparable. For the year 1950-51 statistics of land utilisation 
were available for 28°43 crore hectares or 87 per cent of the total area 
of 32 Зо crore hectares. For 1973-74, these statistics were available for 
30°42 crore hectares or 92°74 per cent of the total area, 
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‚ Statistics of land utilisation are published in the annual Indian 
Agricultural Statistics issued by the Directorate of Economics and 
Statistics, Ministry of Food and Agriculture, This publication is 
brought out in two volumes. Vol. I gives statistics relating to various 
States of the Indian Union and Vol. II relates to different districts of 
various States. Land utilisation statistics are available in this publica- 
tion under the following heads : 

1. (A) Total area. 

(B) Classification of Area. 

2. Area Irrigated and Crops Irrigated. 

3. Area Under Crops. 
1: (A) Total Area 

The term ‘Total Area’ here does not mean the total geographical 
area of the country but the ‘Total Reporting Area’. By total reporting 
area is meant such area for which “‘complete accounting of land. utilisa- 
tion is possible". In other words, total area (or total reporting area) 
refers to such area about which land utilisation statistics are available. 


Statistics of total area are obtained from two sources : 
(i) Surveyor General of India. 

(ii) Village records maintained by the Revenue Department. 

The total geographical area of the country is 32°8 crore hectares.* 
1 (B). Classification of Area 

The classification of area which was followed till 1949-50 was as 
under : 

(i) Forests. 

(ii) Area not available for cultivation. 

(iii) Other uncultivated lands excluding current fallows. 

(iv) Current fallows. 

(v) Net area sown. 

However, from the vear 1950-51, a new classification has been 
adopted which gives more detailed information. At present there are 
nine classes under which information is available. These are : 

(i) Forests. 

(ii) Land put to non-agricultural uses. 

(iii) Barren and uncultivable land. 

(iv) Permanent pastures and other grazing lands. 

(v) Miscellaneous tree crops and crops not included in the net 

area sown. 

(vi) Culturable waste. 

(vii) Current fallows. 
(viii) Other fallow land. 
(ix) Net area sown. 
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2. Area Irrigated and Crops Irrigated 

Statistics of land utilisation published in the Agricultural Statistics 
of India also give an idea about the total irrigated area of the country 
and about the crops irrigated. Details about irrigated area are available 
according to the different sources of irrigation, namely : 

(i) Canals. 

(ii) Tanks. 

(iii) Wells. 

(iv) Other sources like temporary bunds for the storage of rain 
watcr and streams too small to be classed as canals. 


Of the net area under cultivation, 32'2 per cents irrigated. 
During the period 1950-51 to 1973-74, the net irrigated area increased 
by 1°17 crore hectares as shown in the following table. 


(Crore hectares) 


Source of irrigation 1950-51 1973-74 Ificrease (+) 
or decrease (—) 


Canals 0°83 1 1°30 (+)o'47 
Tanks 0°36 0739 (+)o'03 
Wells обо 133 (+)o'73 
Other sources озо 024 (4-)o'06 

Total 2'09 4. 3'26 (+)т'т7 


Source : India, 1976. 


3. Area Under Crops 


These statistics relate to land under various crops and land put to 
other uses. : Two sets of acreage figures are available at present in our 
country. 

(A) Official series basedjon village records, and 


(B) The NSSO series based on sample surveys. 


(A). The Official Series. The official series relate to statistics 
of land utilisation giving the area of land put to different uses and the 
area under different crops. These statistics are available since 1884. 
In the official series, the entire country has been divided into two broad 
categories, namely : 

(i) Temporarily settled areas, and 


(ii) Permanently settled areas. 


G) Temporarily Settled Areas. The system of Temporary Settle- 
ment or Ryotwari System was introduced. in the. year 1892 in О.Р., 
Punjab and Tamil Nadu. In temporarily Mors areas the figures of 
acreage are collected by the village accountant or the Patwari and are 
recorded by him in his register popularly known in Northern India as 
Khasra. "These Patwaris are known by various names in different 
parts of the country such as Karamchari in Bihar, Telhati in Tamil 
Nadu. In U.P. Patwaris have been replaced by Lekhpals. One 
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Lekhpal is in-charge of one' village or group of villages depending upon 

the size of the population'and land area. He keeps records of individual 

fields, crops and ownership of fields. He is expected to make field-to- 

field inspection called. Partal. "Phe work of the village accountant is 

supervised by ‘his immediate superior officer known by the name of 
Kanoongo. ‘They are'also known’ as Revenue Inspectors. They are 

to supervise the work of the Patwaris by checking 7% of the entries” 
in the Khasra and making 10% personal inspection of fields. Superior 

to Revenue Inspectors are Taluka Officers, Naib Tehsildars and. Tehsil- 

dars in-charge of their respective areas. Such officers supervise and 

verify the work of junior officers. Finally, there are District Officers, 

called Collectors, who control the work of revenue collection in the 

entire district under their charge, The figures supplied by different 

primary reporting agencies within a district are totalled to find out the 

total area devoted to various crops in the district. State totals are 

obtained by adding district figures. 


(ii) Permanently Settled Areas. In permanently settled areas the 
land revenue is fixed and not subject to change. Because of this factor 
the States are not very much interested in collecting the information 
concerning land area and production of crops. There is no primary 
reporting agency on a permanent basis in any of these areas. The 
police Chowkidar or the village Mukhia or Headman isentrusted with 
the task of collecting statistics of land revenue. He also maintains 
certain statistical records but he is not trained in this respect. Most 
of the statistical information of these areas is mere guesswork and there 
is no supervisory staff to check and verify the entries in the register of 
the village headman. The records maintained by the village headman 
are transferred to the Sub-Divisional Officer who modifies the figures 
by his own experience and forwards them to the district officer concer- 
ned. The district officer modifies the figures for the entire district and 
forwards them to the Director General of the State who makes available 
a consolidated figure of land acreage for the whole State. 


There is no uniform system of measuring the area. under various 
crops and different States follow different practices. Some changes 
have been made in the practice for the collection of agricultural statistics 
in permanently settled areas. In 1944-45 the Bihar Government estab- 
lished a machinery of Primary Reporting Agency of Karamcharis who 
have been recording statistics of land area on the basis of complete 
field inspection. The Government of West Bengal conducted plot-to- 
plot survey in 1944-45 and collected figures of land acreage on the basis 
of random sampling. In some inaccessible areas the Indian Council of 
Agricultural Research (ICAR) conducted aerial surveys*but the experi- 
ment was not very satisfactory. 


The statistics of land acreage are relatively more reliable in 
temporarily settled areas. In permanently settled areas there is need 
for proper survey of all land areas for which maps should be prepared 
separately for various tehsils and talukas. Area statistics need be 
collected on the basis of complete enumeration and sampling should 
not be resorted to for all places. 
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B) The NSSO Series. Apart from the figures collected from 
village еы there is another set of figures collected through NSSO. 
The NSSO is collecting data during the regular rounds of survey of s 
under different crops. Estimates are given for the whole фур an 
also for certain population zones. These figures are based on random 
sampling. The estimates of area under [different crops are given in 
terms of ‘gross area’ and ‘allocated area’. The gross area is define as 
the area under the crop grown singly (one crop) together with the area 
under all mixed crops having that crop as one of the components. The 
allocated area under a crop has been defined as area under the crop 
grown singly plus the apportioned area under the crop from all the 
mixed crops having that particular crop as one of the components. 


There are differences between the figures given by NSSO and those 
obtained through village records. These differences arise due to the 
following factors : 

(i) Difference in methods. 

(ii) Difference in coverage of crops. 

(iii) Difference in seasons to which figures relate. 

(iv) Difference in field works of the two agencies. 

(v) Difference in the classification of area under food and fodder 
crops. 

(vi) Difference in the allocation of area under mixed crops. 

(vii) Difference due to sampling error in the NSSO estimates. 


While the all-India land utilisation statistics are published in the 
Indian Agricultural Statistics, the individual States publish these statis- 
tics in the Season and Crop Reports. The scope of these reports differs 
from one State to another. The statistics of land utilisation published 
in the respective Season and Crop Reports generally agree with those 
published in the Indian Agricultural Statistics. 


The statistics of land utilisation are also published in summary 
form in the following publications : 
(i) Agricultural Situation in India (monthly). 
(ii) Indian Agriculture in Brief (annual). 
(iii) Abstract of Agricultural Statistics (annual). 
(iv) Statistical Abstract of the Indian Union (annual), CSO. 


part of land records, 
hand. there is no such proper arrangem 
tion in regard to the availability and reliability of the data has not been 
quite satisfactory. Over the past fe 
to rectify the situation, the most noteworthy among which were the 
adoption of sample surveys by the States of West Bengal, Kerala and 
Orissa and the introduction of complete enumeration in the State of 
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IL Crop Output Statistics 

In a broader sense output statistics of agricultural sector refer to 
the production figures of food and non-food crops including fibres, 
oilseeds, etc. Besides these, there are other sources of output also such 
as fisheries, forests and livestock products. However, here we shall 
discuss only the crop output statistics (both food and non-food crops). 


Though the production of foodgrains in the country fluctuated 
from year to year, it touched the new height at 120°8 million tonnes in 
1975-76, аѕ against 99:8 million tonnes Jn 1974-75; thereby showing ап 
increase of 21 million tonnes. 


Strictly speaking, statistics of output of various crops should be 
estimated only after the crops have been cut and harvested. But 
ordinarily it is not possible to do so, particularly ina country like ours, 
where there are millions of cultivators baving small and scattered hold- 
ings and not maintaining any type of farm accounts. Under such 
circumstances output statistics have to be estimated before the crop is 
actually harvested and as such they are in the nature of crop forecasts. 
Output statistics are estimated by multiplying the area under acrop by 


the expected yield per acre in the season concerned. 

As in the case of acreage statistics, two sets of figures are available 
about yield also. They are : 

1, Official Series, and 

2. NSSO Series. 


1. Official Series. Official series аге more comprehensive and 
cover a large variety of crops. The methods are used to obtain statistics 


of crop yields : 
A. Traditional Method. 
B, Random Sampling Method. 


A. Traditional Method. Under this method yield statistics are 
obtained by finding out the product of thé area under crop and the 
average yield per acre of the crop concerned. Average yield is obtained 
by multiplying the normal yield by the condition factor. 


Normal yield has been defined as “average yield on an average soil 
in a year of average character". That is, the normal yield is that which 
the past experience has shown to be the most general recurring crop 1n 
a series of years, the typical crop of the local area, the crop that the 


cultivator has a right to expect. 


Normal yield is usually computed by conducting стор cutting 
experiments on selected plots. On these plots the sowing, harvesting, 
threshing, etc., is done in the presence of officers of the Agricultural 
Department. Selected plots are expected to Бе representative of the 
area concerned. After the crop is harvested, it is weighed before the 
officers, Certain allowance is made for the moisture contained in it. 
However, these estimates are not very reliable because of certain reasons. 


It is not possible to find an accurate limit for the errors of these estima- 
tes. Sometimes it is stated that there is an error to the extent of 20%. 
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The plots are not always representative since, after the selection of the 
plot, all care is given in the form of irrigation, manure, seeds, etc. The 
number of crop cutting experiments are also not sufficient. The Royal 
Commission on Indian Agriculture considered the practice of estimating 
normal yield by “Eye Average Method" as Superior to crop cutting 
experiments. In Punjab the “Eye Average Method” is already in 
Operation. 

The condition factor is found by the Annawari System. It gives 
condition of the crop in any particular season in relation to the normal 
crop. Generally a fixed number of annas represent the normal crop. 
This number of annas Tepresenting the normal crop is not uniform 
throughout the country and often these figures vary from one State to 
another. In some States fifteen annas represent the normal while in 
others only thirteen annas serve the purpose. Condition factor is 
estimated twice, during the growth of the crop and again at the time 
of harvesting the crop. The condition factor is estimated by lekhpals 
in their respective areas. The lekhpals submit such figures to the 
tehsildar who makes his own estimates and fixes the condition factor or 
the anna estimate for his tehsil, Tehsil estimates are forwarded to 
the Director of Agriculture so as to enable him to fix one single figure 
for! the State:as a whole. 


The estimates of the condition factor made by the lekhpals are 
usualy not reliable for several reasons. The lekhpals do not possess 
the necessary technical knowledge. Each patwari has his own concep. 
tion of the normal crop. Apart, therefore, from the subjective bias in 
est'mation, the interpretation of the normal yield rests entirely with 
the lekhpal. 


A number of changes both in technique and method of collection 
are needed. Crop cutting experiments should be extended to all the 


shortcomings associated with the yield estimates in our country. 


B.. Random Sampling Method. The random sample survey method 
was recommended as early as 1919 and some State Governments tried 
it but they could not continue due to shortage of funds. For example, 
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Bihar and Orissa abandoned the method due to huge expenditure. The 
ICAR introduced the random sample survey method in cotton field in 
1942 and later the method was extended to foodgrain crop in U.P., 
Punjab, Tamil Nadu, Maharashtra, Orissa and Madhya Pradesh. Since 
1943, crop-cutting surveys, based оп random sampling technique, are 
being conducted under the guidance of ICAR in all the States of India 
УДА West Bengal, where the job has been entrusted to the care 
о ; 

‚ In the new technique, a number of villages are selected from the 
tehsil in the States on the principle of random sam ling and within each 
village a few fields are selected on the same basis. In each selected field 
a plot one-eighth of an acre (33 X 161) is selected at random. After the 
selection of the plot, the crop is cut from the plot, threshed, winnowed 
and weighed immediately. Certain allowance is made for the residual 
moisture on the basis of special experiments conducted for the purpose. 
Field work is conducted by the staff and officers of the Revenue and 
Agricultural Department of the State. The ICAR provides technical 
guidance through its technical officers and the similar staff available in 
the State concerned. 


The advantage of sample survey method is that unbiased estimate 
of the yield per acre can be obtained. Also, the margin of error by 


which this estimate is likely to depart from the true unknown value can 
be known. 


2. NSSO Series. The National Sample Survey Organisation 
collects yield statistics regarding major cereal crops during the course 
of their regular surveys. Special set of investigators who collect 
miscellaneous statistical data in rural areas have been entrusted with the 
collection of crop yield statistics also. This agency has not been very 
successful as the investigators very often do not reach the village con- 
cerned at the time of the harvest. 'The NSSO figures are about 33% 
higher than the official estimates. The NSSO is of the opinion that 
the difference between the two figures is due to the following reasons : 


(i) difference in shape and size of the cut adopted by the two 
agencies resulting in border bias and location bias; 

(ii) difference in driage factors. 

(iii) difficulties of harvesting at the proper time with a set of 
moving investigators in the case of NSSO rounds, 

(iv) difference in the methods of estimation, and 

(v) difference in the methods of supervision. 


It should be noted that a working group was Set uP in the CSO to 
examine the difference between the two estimates of cereal crops. It is 
hoped that the finding will enable reconciliation by removing short- 


comings of both the methods. 


Publications on Agricultural Statistics 

The Directorate of Economics and Statistics, Ministry of Agricul- 
ture and Irrigation, brings out a large number of publications on 
different aspects of agriculture. There are in all £6 publications the 
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i i j l entitled 
names of which are given at the back of the monthly journal entit 
Agricultural Situation in India. The names of some leading publications 
are given below : 


Weekly. 1. Weekly Bulletin of Agricultural Prices. 
Monthly. 2. Agricultural Situation in India. 


Aunual. Agricultural Statistics. 


3. Estimates of Area and Production of Principal Crops їп India 
(Summary Tables), Vol. I. 


4. Estimates of Area and Production of Principal Crops. in India 
(Detailed Tables). 


5- Agricultural Prices in India. 
6. Agricultural Wages in India. 
7. Indian Agricultural Statistics, Vol. I. 
8. Indian Agricultural Statistics, Vol. II. (Detailed Tables) 
9. Indian Livestoch Census, 1971 Vol. I. (Summary Tables). 
10. Indian Livestock Census, 1971, Vol. II. (Detailed Tables). 
п. Area, Production and Yield per Acre of Forest Crops in India. 
12. Bulletin of Food Statistics. 
13. Bulletin of Agricultural Prices, 
14. Indian Crop Calendar. 
I5. Indian Forest Statistics, 
16. Cotton in India. 
17. Jute in India. 
18. Lac in India. 
19.. Tobacco in India. 
20. Tea in India. 
21. Coffee in India. 
22. Rubber in India. 
23. Indian Agricultural Atlas. 


III, Miscellaneous Agricultural Statistics 
I. Livestock and Poultry Statistics, 
2. Statistics of Forests. 
3. Statistics of Fisheries. 


I. Livestock and Poultry Statistics. The Statistics of Live- 
stock were first collected at the instance of the Secretary of State for 
India and it was in the year 1883 that the Statistical Conference pres- 
cribed a form on which the details of cattle census were to be filled. 
Since then figures of livestock began to be published quinquennially in 
Agricultural Statistics in India. In 19202 cattle census was held for 
the whole country and since then such a census is held every fifth year. 
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Livestock statistics are mainly collected. through quinquennial 
censuses conducted by the State Governments and co-ordinated on an 
all-India basis by the Directorate of Economics and Statistics. The 
Directorate fixes the census data and lays down the pro forma. The 
States collect the required data in respect of both the rural and urban 
areas with reference to the prescribed date, compile and publish them. 
The opportunity of the census is taken to collect data not only on the 
number, of livestock but also thatof poultry, agricultural machinery 
and implements. 


Up till now 11 livestock censuses have been conducted—the 
eleventh one relates to the year 1971. The various heads under which 
livestock, poultry and agricultural machinery statistics are classified 
are : 


A. Livestock : (i) Cattle. (ii) Buffaloes. (iii) Sheep. (iv) Goats. 
(v) Horses and Ponies. (vi) Other Livestock (comprising mules, 
donkeys, camels and pigs). 


B. Poultry. 


F C. Agricultural and Machinery : (i) Ploughts—(a) wooden, (b) 
iron. (ii) Carts, (iii) Sugarcane Crushers—(a) worked by power, (b) 
worked by bullocks, (iv) Oilengines with pumps (for irrigation pur- 
poses), (v) Electric pumps (for irrigation purposes), (v) Tractors (used 
foragricultural purposes only), (vii) Ghanies—(a) five kg. and more, 
(b) less than five kg. 


; Statistics of livestock and livestock products, etc., are available 
in the following publications : 
(i). Indian Livestock Census (Quinquennial), 
(ii) Indian Livestoch Statistics (Annual). 
(їй) Agricultural Statistics of India (Vols. I and II). 
(iv) Abstract of Agricultural Statistics in India. 


aa Statistical Abstract of the Indian Union published by the 


2. Forest Statistics. Forests occupy one of the most important 
and basic natural resources occupying an area of 47:6 million hectares 
and account for approximately 22 per cent of the total land area of the 
country. The term “Forest Statistics" refers to all types of data relat- 
ing to the forest economy like area under forests. volume and value 
of forest products, employment in forestry, revenue and expenditure 
in forestry and trde in forest produce, etc. 


Before independence forest statistics were published in the 
Annual Returns of Statistics Relating to Forest Administration in India 
issued by the Central Forest Department, Government of India. Now 
they are available in an annual volume Indian Forests Statistics publi- 
shed by the Directorate of Economics and Statistics, Ministry of Food 
and Agriculture. Information contained in this volume is based on 
returns submitted by Forest Departments of different State Govern- 
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ments. Published figures relate to March 31 of eachyear Broadly speak- 
ing, the following types of data are available in this publication : 
(i) Area under forests: 
(ii) Volume of timber and firewood. 
(iii) Outturn of timber and other minor produce. 
(iv) Employment of labour in forestry and forest industries. 


(v) Revenue and expenditure statistics of forest department and 
forestry. 


(vi) Foreign trade. 
Other publications which contain details of forest statistics are : 


(i) Abstract of Agricultural Statistics published by the Directo- 
rate of Economics and Statistics. 


(ii) | Statistical Abstract of the Indian Union published by CSO. 


(iti) Annual Administration Reports of Forest Departments of 
different States in India. 


(iv) A Review of Forest Administration in India, quinquennial 
volume published by the Government of India. 


3. Fisheries Statistics. Statistics of fisheries are inadequate, 
incomparable, unorganised and unco-ordinated. Whatever statistics 
are available are of recent origin. Prior to 1950 the only published 
material in our country about fisheries was contained in the Annual 
Report of the Department of Fisheries, Tamil Nadu and some statistics 
published by the Department of Fisheries in West Bengal (which was 


wound up in 1923). The available data about fishery fall in the follow- 
ing categories : 


4 p ] peta available in maei reports—one major source of data 
on msherles 1s the Report ‘оп the Marketing of Fishi Indi 1 
brought out by the DMI. на 

‚ i) Data available. with Fisheries. Research Institute and 
statistics such as the Central. Marine Fisheries Research Institute 


(CMFRI). 


(їп) Data available with Fisheries Development Adviser and in 
State Gazettes, 


(iv) Data about consumption of fish collected by №50. 


[ In the Fifth Plan, a special boost will be given to the Fisheries 
sector. 


ІУ. Indices of Agricultural Production 
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Up to 1950-51 index numbers of agricultural production covering 
19 principal agricultural commodities were compiled with quinquennium , 
1934-35 t0 1938-39 as base by the Directorate of Economics and Statis- 
tics, Ministry of Food and Agriculture. In view of the progress made 
in compilation, coverage and technique of crop estimates, these indices 
were considered out of date. A revised series of indices of agricultural 
production was brought out in 1950-51 with 1949-50 as base. In 
1953-54 the base year was changed to the year ending June 1950. The 
index covered 28 crops. 


Index Numbers of Agricultural Production in India 1971-72* 


The Directorate of Economics and Statistics, Ministry of Agri- 
culture and Irrigation, Governrnent of India, has brought out a revised 
series of. Index Numbers of Agricultural Production with triennium 
ending 1961-62—100 as the base, thus replacing the series with the 
base 1949-50. The new series 15 based on the recommendations of the 
Technical Committee on index number relating to agricultural eco- 
nomy set up in 1965 under the chairmanship of Jate Dr. V. С. Panse 
to look into the scope, coverage, methodology, etc., of various agricul- 
tural index numbers being constructed at different levels. Taking into 
account the various criteria for selection of base period, triennium 
ending 1961-62 has been selected as the base period. This is in 
aceordance with the recommendations of the Central Advisory Com- 
mittee on Statistics as also of the Food and Agriculture Organisation. 
Further a number of new series, i.e.; index numbers of net area sown, 
cropping pattern, cropping intensity and productivity per hectare of 
net area sown, have also been introduced for the first time. 


To make the all-India series on agricultural production more 
representative, the coverage of items in the revised series has been 
enlarged to 38 crops, divided into two main groups and eight sub- 
groups as against 28 crops; divided into two main groups and six sub- 
groups in the existing series. The. crops included in the revised series 
account for nearly 94 per cent of the. total gross. cropped area in the 
country. 

As in the old series, the Index Numbers of Agricultural Produc- 
tion are constructed by the chain-base method. The production of a 
crop during a year is expressed as a relative of its production in the 
preceding year, keeping the coverage and the method of estimation 
the same. These relatives for each crop are linked to the base year 
through the interventing chain relatives to give the production index 
for the crop. The weighting pattern for the Agricultural Production 
Index has also not undergone any major changes; either group-wise 
or item-wise. Average prices in the base period have been used a: 
weight. co-efficients. These are either the harvest prices, or the 
wholesale prices during the peak marketing period in the primar 
markets. Ў 

In order to maintain continuity in tho Index Numbers of Are 
Under Crops. agricultural production and yield, the new series ha 


ж Prepared in the Statistical Intelligence Division of the Department of Statistic 
Source : RBI Bulletin, March, 1973- id di 
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beenlinked with the old series by suitable conversion factors for the 
base period in respect of individual crops. The table below gives 
indices of Agricultural Production from 1960-61 to 1974-75. 


INDEX NUMBERS OF AGRICULTURAL PRODUCTION 
(BASE : TRIENNIUM ENDING 1561-62—100) 


Foodgrains Non-Foodgrains All Commodities 
Year (weight 65.86) (weight 34.14) 
1960—61 IO2'I 1038 102'7 
1965—66 89'9 IC7'I 958 
1966—67 9г9 103'7 95'9 
1967—68 1171 115:6 11676 
1968—69 11577 i13'2 114°8 
1969—70 123'5 120°5 122'5 
1970—71 1 233'9 12676 13014 
1971—72 132'0 1289 130°9 
1972—73 I21'2 11890 120'4 
1973—74 1315 1371 1334 
1974—75 1256 1365 12973 


Estimates of Area and Production of Principal Crops in India, Directorate 
of Economics and Statistics, 


Defects of Indian Agricultural Statistics 


A brief account of agricultural statistics shows that after inde- 
pendence considerable improvement has been made in the quality and 
coverage of these statistics. However. there is still Scope for improve- 
ment as there are a number of defects from which these statistics 
suffer. The main reason for most of the defects is that agricultural 
statistics have largely been a by-product of administrative activity. 
Some of the defects, as pointed out by the Technical Committee on 
Co-ordination. of Agricultural Statistics in India, are as follows : 


1. Gaps in Coverage. "These are of two types : (a) gaps in geo- 
graphical coverage, and (b) gaps in availability of statistics in respect 
of certain items. 


With regard to geographical coverage available statistics relate 
to 89% of total land area (72 million out of 811 million acres), 
la respect of certain items statistics are not available. These are 
wages, hours of work and non-cash benefits of agricultural labour in 
India, the cultivator’s part-time occupations, his annual income from 
agriculture, his cost of production and its various components. 


2. Lach of Uniformity in Definition and Classification. Various 
terms and phrases are used in different Senses in different parts of the 
country and even at present there is lack of uniformity in definitions 
of various terms in the classification of agricultural statistics. Not only 
this even the techniques of analysis of the collected figures are not 
uniform throughout the country. The method of obtaining area 
statistics differs between temporarily settled areas and permanently 
settled areas. The acreage statistics of the country are not of a uni- 
form quality. For example, land would be Considered fallow in 
Punjab ifit remains uncultivated for two years, in Maharashtra the 
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period is ten years, in U.P. буе years. The method of yield estima- 
tion also varies from State to State. In some States we have the 
Annawari system to represent the fraction of the normal crops. Even 
in the Annawari system different States follow different rules. 

3. Defective Classification, Tabulation and Processing. The classi- 
fication and tabulation of data is not satisfactory in respect of certain 
items relating to agriculture, The classification of area does not take 
into account economic and other considerations and does not give any 
idea about the suitability of land for various purposes. The printed 
schedules in which Lekhpals collect statistics and submit returns are very 
elaborate but not uniform throughout the country. Most of the informa- 
tion collected by the primary reporting agency at the village level is 
wasted for want of proper statistical treatment. A good deal of 
information collected is not tabulated and interpreted beyond the tehsil 
level and thus cannot be properly used. Much improvement can be 
made in the methods of processing the collected information by making 
suitable changes in the schedule for the collection of agricultural 
statistics, 

4. Defects of Primary Reporting Agency. The accuracy of the 
statistics depends largely upon the care taken by the primary reporting 
agency. The Lekhpals do not take due care of the work entrusted to 
them. In most cases their records are wrong and they complete the 
Khasra or schedules sitting at their home according to their vague 
impressions, whims, etc. Most of the defeects in the primary reporting 
agency are due to: 

(i) increased burden of work on Lekbpals and their participation 
in extra routine duties. 

(ii) low remuneration paid to them which compels them to supple- 
ment their income by undesirable methods, and 

(iii) lack of adequate training to Lekpals in handling their job. 

The Lekhpals have to be better paid and trained for any improve- 
ment in the quality of data. 

5. Defective Supervision and Inspection. As pointed | out earlier 
the work of Lekhpals is to be checked by the Kanoongos, Naib-tehsildars 
and Tehsildars. However, in practice, the entries recorded by the 
Lekhpals are taken as correct and rarely verified—with the result that 
defects in the statistics collected by them remain unnoticed and the 
statistics are passcd on to higher stages without any correction or 
editing. The method of checking 1s also defective because the Kanoongo 
is expected to check such cases in which there has been: considerable 
changes from the past. The entries which are checked by these people 
should be selected on the basis of random sampling. It is satisfying 
to note that some States are now insisting on better supervision of the 
work of the primary reporting agencies and are gradually following the 
system of random sample checks. Schemes of central supervision and 
checking have also been introduced by the Government of India 
recently, However, much remains to be done. It is necessary that the 
tour programmes of the officials entrusted with the task of checking are 
kept strictly confidential so that a surprise check of the work is 


possible. 4 
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6. Lack of Proper Co-ordination. One major defect in our agri- 
cultural statistics is'the unsatisfactory co-ordination in various types of 
data collected by different departments of the Government. In practice, 
little attention is paid by a department to the statistics collected by 
other departments. There is duplication of work also. Various 
departments conduct ad hoc suverys regarding the same problem, 
iode be dent of each other which not only results in huge wastage of 
public funds and human energy but also widely differing results which 
become a problem for those who use such statistics: Attempts are being 
made to achieve co-ordination and the CSO is doing a great service in 
this regard. 


7. Delay in Publication. - Besides the other defects, the delay in 
publication of whatever statistics are available reduces their изе much 
less than what they could be. For example, crop forecast statistics 
which could be published well in advance before the crop is actually 
harvested аге always published a month later and in some cases when 
the crop has actually reached the market. Annual volumes are, on an 
average, published two or three years after the year they refer to. 


. Delay in publication is admitted by authorities. But they express 
their helplessness and an excuse is made that India is a big country and 
it takes time for the figures collected to travel from the village where 
they are collected to the tebsil level, then to the district, the State and 
finally the central consolidating authority where they аге processed and 
published on an all-India basis. In thislong chain if the figures are 
delayed at any stage, the delay multiplies. It is suggested that efforts 
should be made to minimise the delay at each level and the data should 
be forwarded to the Central Government in addition to that which is 
forwarded to State headquarters. 


‘The main rezson for all sorts of defects in agricultural statistics is 
the insufficient attention paid to them by the Government. As has 
been pointed out earlier, the collection of agricultural statistics in India 
started not as a regular scheme but as a by-product of official adminis- 
tration. It is only recently that the Government has realised the 
seriousness of the situation when most of their schemes could not be 
executed due to paucity of adequate statistics, 


The Government is conscious of the defects that exist in the avail- 
able. agricultural statistics. Efforts are being made to improve their 
coverage and reliability. These efforts are directed towards the 
following aspects : 


(i) Improvement in accuracy. 
(ii) Amplification of scope of existing data, 
(iii) Extension of geographical coverage. 
(iv) Standardisation of data. 
(v) Minimising delay in publication. 
(vi) Processing of data.. 
(vii) Co-ordination of data collected by different agencies. 
(viii) Collection of new statistics, A 
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Besides these, a scheme for strengthening Central supervision over 
crop estimation by various State agencies has been evolved. In areas 
where the primary reporting agencies do not exist these agencies are 
being set ир. Also the work of primary reporting agency, i.e., Patwari; 
is being reduced so that he devotes more time to the collection of agri- 
cultural statistics. Crop cutting surveys are being extended so as to 
cover all the principal crops in various States. Besides collecting statis- 
tics of area and yield of crops attention is being paid to collect data in 
respect of consumption, stocks, cost of cultivation of crops, percentage 
of marketable surplus, etc. 


Programme of Improvement of Agricultural Statistics during Fifth 
Plan 

During thé Fourth Plan, a scheme was initiated for timely report- 
ing of estimates and area of production of principal crops. The scheme 
aimed principally at providing estimates of area immediately after sow- 
ing and of production immediately after harvest. During the Fifth 
Plan, this scheme is proposed to be continued and enlarged in scope and 
coverage. Another Fourth Plan scheme which will be continued in the 
Fifth Plan relates to studies of cost of cultivation of crops so as to 
provide comprehensive data on a continuing basis for helping in the 
formation of price policies for agricultural commodities. One ofthe 
major developments proposed in the Fifth Plan relates to the establish- 
ment of a regular primary cropreporting agency in Kerala. Orissa and 
West Bengal where, at present, statistics of area and production of 
crops are framed on the basis of sample surveys only. Another impor- 
tant scheme envisaged during the Fifth Plan relates to supervision of 
area enumeration and crop cutting experiments with the twin objective 
of furnishing reliable advance estimates through supervised samples and 
of providing the basis for effecting improvements in the studies, pro- 
cedure of collection and compilation of area and production statistics. 
This scheme will be operated under the auspices of The National 
Sample Survey Organisation. The Fifth Plan also contains provisions 
foran extended programme of agro-economic research. Agricultural 
census on a complete enumeration basis with 1970-71 agricultural year 
as reference was carried out during the Fourth Plan. A provision has 
been made in the Fifth Plan to repeat this on quinquennial basis so as 
to/provide data available for the succeeding five-year period. 


With these steps it can be expected that in the years to come there 
is bound to be further improvement in the agricultural statistics, 


TRY YOURSELF 

t. What specific improvements have been made in Crop Area and Produc- 
tion Statistics in India? Comment in the light D vue зн on the comparability 
of official data on foodgrains production since independence. 

Stee s Р (М.А. Econ. Jabalpur, 1974) 

2. Attempt a critical examination of the quality and content of Agricultural 
Statistics in India. 

3. Write a lucid note on the nature and scope of agricultural statistics in 
India. (B. Com., Bangalore, 1974) 

4. Discuss the present system of collection of agricultural statistics relating 
to are ield in our country, What further improvements can you suggest 

eeu deis "i (M. Com. Gorakhpur, 1973) 

3 (45—103[1977) SM-11777 
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s. Estimate of the yield per acre is normally made either by (a) Traditional 
method or (b) Random sampling method. Explain the above two methods pointing 
out their weakeneses and suggest ways of improving them. (B. Com. Punjab, 1970) 


6. How are the statistics relating to area and yield of agricultural crops 
collected in India and how the random sampling methods of estimating agricultural 
yield is superior to the traditional method ? (B. Com. B.H.U., 1973) 


7. “The newly revised agricultural output index of India is better construc- 
ted and has wider coverage but it has completely lost its continuity with the old 
series." Comment. (B.A. Hons. Econ. Delhi, 1972) 

. ,8. Discuss the comprehensiveness and reliability of Indian Agricultural 
Statistics. What steps have been taken in recent years to improve their efficiency ? 
(B. Com. Punjab, 1974) 

9. Write a note on the nature and sources of official statistics relating to 
acreage and production of principal crops in India. 
(M.A. Econ. |MSc. Agr. Solan, 1975) 
Р 10. How аге crop forecasts prepared in India ? Discuss the need for improv- 
ing the accuracy of these forecasts. (B. Com. Kurukshetra, 1976) 


Section 


3 Industrial Statistics 


Nature 


These days the economic development of a country is measured 
more in terms of industrial development than anything else. However, 
no systematic development is possible in the absence of adequate, 
reliable and timely data on various aspects of industries such as number 
of industrial units, their size, capital structure, number of workers 
employed, wages paid to them, output, raw materials, power consumed, 
etc. When we talk of industrial statistics, we mean thereby factual 
data on various aspects of industries. Broadly speaking, statistics per- 
taining to industries сап be classified under the following heads: (a) 
Statistics of Output : (b) Statistics of Inputs ; (c) Statistics of Employ- 
ment ; (d) Statistics of Capital Structure : (e) Other data such as stock, 
potential expansion, maximum capacity of the units, etc. When we 
talk of industrial statistics, we include therein not only the large-scale 
industries but also the small-scale and cottage industries. 


In our country, prior to independence no serious attempts were 
made to collect industrial statistics due to the deliberate indifference 
of the British rulers towards industrial development. Hence before 
independence industrial statistics were highly inadequate and unsatis- 
factory. It is only after independence that we have evolved a suitable 


machinery for the collection of industrial statistics. 


Historical Development 


The history of industrial statistics may be said to begin from the 
year 1942 when the Industrial Statistics Act, 1942, was passed. The 
passing of this Act constitutes a landmark in the development of 
industrial statistics in our country. The Act granted statutory powers 
to the State Governments to conduct census of Indian Manufacturing 
Industries and collect statistics relating to them. Although the Industrial 
Statistics Act was passed in 1942, it became operative from 1945, when 
the Directorate of Industrial Statistics was set up at the Centre to 
enforce the Act. The Directorate framed the census of Manufacturing 
Industries Rules (1945) which were. adopted by all the Provincial 
Governments. The first census of manufacturing industries was con- 
ducted in 1946 and results were published in a separate publication by 
the Central Government. From 1951 the Directorate of National Sample 
Survey conducted sample survey of manufacturing industries. The 
Act of 1942 was repealed by the Collection of Statistics Act 1953, which 
came into force on roth November, 1956. However, the rules regarding 
the appointment of statistical authority and procedure for collection of 
industrial data known as the Collection of Statistics (Central) Rules, 
1959, could be finalised only as late as 1959. From 1959 onwards 
both Census of Manufacturing Industries and Sample Survey of Manu- 
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facturing Industries were discontinued and an annual Survey of Indus- 
tries was conducted. The primary sources of industrial statistics 
are : 
I. Census of Manufacturing Industries (CMI). 
II. Sample Survey of Manufacturing Industries (SSMI). 
ПІ. Annual Survey of Industries (ASI). 
IV. Monthly Statistics of Production of Selected Industries in 
India. 
V. Monthly Abstract of Statistics. 
VI. Statistical Abstract of the Indian Union (Annual). 
А brief description of each of these is given below. 


1. Census of Manufacturing Industries (CMI) 


The first Census of Manufacturing Industries was conducted in 
the year 1946 under Census of Manufacturing Industries Rules (1945). 
Since 1946 the Census of Manufacturing Industries became an annual 
ү, In all, eleven censuses were conducted (1946-56) оп a statutory 

asis, 

Censuses were also conducted on a voluntary basis for the years 
1944, 1945, 1957 and 1958. There were in all fifteen censuses of manu- 
facturing industries, and the statistics so collected for the years 1946 to 
1958 have since been published. The results of the censuses relating 
to the years 1944 and 1945 were neither tabulated nor published as only 
37 рет cent of the factories submitted returns and they were of poor 
quality. 


For the purpose of these censuses, industries were classified under 
63 heads on the lines of the United Nations classification. However, 
censuses were actually confined to only 29 industries, of which 28 only 


were left in 1952, because there was no factory in the producer gas plant 
industry. 


The information was collected on prescribed forms which were 
modelled on the lines of forms used in the U.S.A. and the U.K. The 
form was broadly divided into six parts as follows : 


Part A: General information, i.e., name and address of factory, its 
location, name and address of its proprietor, managing agent, 
etc. 

Part B: (i) Number of working days during the year, 

(ix) Capital structure. 


Part C: Number of persons employed and the amount of salaries and 
wages paid. 


Part D: Details of the value and quantity of power purchased and 

consumed during the year, i.e., fuel, electricity, coal, gas, etc. 

Part Е: Details of the value and quantity of materials purchased and 
consumed during the year in the manufacture of products and 
by-products. 

Part F: Quantity and value of 


products and by-prod: d 
Aids the yee! y-products manufacture 
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Thus, the Census of Manufacturers provided valuable data which 
proved to be of immense help in shaping. future economic policies. 
However, the collected data suffered from certain limitations, e.g.; 


(i) The coverage was not complete as only 29 groups of industries 
(28 only since 1952) were covered by the census out of the 63 groups 
defined. Some important industries like tea and electricity generation 
were not included in the census. 


(ii) The schedules were not flexible and there was no possibility of 
change without a tedious legal formality. 


(iii) A large number of factory owners did not submit the returns. 
For example, in 1958 the extent of non-response was 18 per cent. 


IL Sample Survey of Manufacturing Industries (SSMI) 


From the year 1951 the Directorate of National Sample Survey 
conducted sample surveys of manufacturing industries. It covered all 
establishments registered under the Factories Act, 1948. Its scope was 
further extended to cover establishments registered or licensed under 
the Industries (Development and Regulation) Act, 1951, in the fourth 
round of the survey conducted in 1954. The survey scheme covered 
the whole of India except Andaman and Nicobar Islands, and establish- 
ments under the Ministries of Defence and Railways. In all, the scheme 
covered 63 industries involving 32,767 establishments of which 3,567, 
or roughly 10% of the total, were selected for sample study. By the 
end of 1958 the scheme covered 8,000 units for sample study. 


The collected data related to capital structure, employment and 
wages, and inputs and outputs of industrial establishments. The last 
annual survey under the scheme was conducted in the year 1958 after 
which the scheme was merged with “‘Annual Survey of Industries" 
under Collection of Statistics (Central) Rules, 1959. 


CMI and SSMI—a Comparison. The information obtained 
under the SSMI was wider in coverage than the CMI as the former 
included all the 63 industries (classified for the purpose of Census of 
Manufacturing Industries) whereas the latter included only 29. Besides, 
the data collected under the SSMI were more reliable than those of the 

MI because of the services of experts and trained investigators. It is 
for these reasons that the National Income Committee preferred 


statistics from the records of SSMI. 


Although the superiority of data collected under SSMI was recog- 
nised by all concerned, yet the SSMI could not replace CMI and the 
two schemes continued side by side. The reason was that the results 
of SSMI could be used only for broad purposes like making of estimates, 


etc., and not for deciding national policies for which detailed CMI 
data were required. The SSMI provided a check upon the inefficiency 


of CMI and stopped the latter from entering the stage of stagnation. 


ПІ. Annual Survey of Industries (ASI) 

Both CMI and SSMI were conducted in the respective organisa- 
tions till the year 1958. However, there was a lot of duplication in their 
efforts which caused waste of time, money and energy. In 1959, it was 
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decided that annual CMI conducted by the States and the SSMI con- 
ducted by the Directorate of NSS be replaced by ‘Annual Survey of 
Industries”. New rules were, therefore, framed under the Collection 
of Statistics Act of. 1953 known as Collection of Statistics (Central) 
Rules, 1959. The Annual Survey of Industries (ASI) has been replaced 
both CMI. and SSMI. The ASI was carried out for the years up to 
1959. So far, the final reports of the surveys have been) published for 
the years up to 1969 and for 1970 the provisional results are available 
and that too for the census sector. The reports for each year are 
published in ten volumes. 


The Annual Survey of Industries is carried out under the authority 
of the Collection of Statistics Act, 1953. Each factory which receives 
a notice under the Act is required to furnish all relevant details of its 
production activities during the year of reference according to the 
prescribed return, known as ASI Schedule. The ASI Schedule is fairly 
detailed and provides for information on capital invested, persons 
employed, salaries and wages paid to employees, fuels and materials etc. 
лт for manufacture, and products and by-products manufac- 
tured, 


Coverage. The ASI covers the entire factory sector comprising 
factories registered under Sections 2 m (i) and 2 m (ii) of the Factories 
Act, 1948, with the exception of defence installations and oil distri- 
bution and storage units. The factories are divided into 2 groups for 
the purposes of the Survey—Group I-—called the ‘Census Sector’ — 
consists of all factories employing 50 or more workers with the aid of 
power, or 100 or more workers without the aid of power; Group 
II—called the ‘Sample Sector’ covers the remaining registered factories, 
i.e., those employing то to 49 workers with the aid of power, or 20 to 
99 workers without the aid of power and industrial concerns happened 
to be selected on the probability sample of about 25 per cent. The Act 
extends to the whole of India except the State of Jammu and Kashmir. 


However, the factories in Jammu and Kashmir were covered ona 
voluntary basis. 


Statistics Collected. Data collected under ASI relate to : 


. t. Capital Structure. Showing separately the fixed and working 
capital and transactions relating to fixed capital (replacements, improve- 
ments, and expansions) during the year. 


2. Employment and Wages. Showing average employment and 
wages and other emoluments paid during the year to different cate- 


gories of workers classified as skilled, semi-skilled and unskilled 
workers. 


(07$. Inputs. Showing the consumption of raw materials, packing 
materials and other consumable stores (excluding intermediate pro- 
ducts) as also repairs and other manufacturing processes done by other 


concerns and fuels and lubricants, and other expenses not included in 
the above items. 


4- Outputs. Showing the quantity and value of goods manufac- 
tured, work-in-progress, by-products, intermediate products, and semi- 


di products including work and repairs done for concerns in the 
ear, 
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н 5. Stochs. Showing stock of raw materials, fuels, products and 

y-products at the end of the accounting year. 

Т 6. Installed capacity. Showing the capacity of production during 

e е угар, additional capacity expected, spare capacity and basis of esti- 

ion. 

& 7. Power equipment. Showing separately the prime movers classi- 
ed as steam and internal combustion engines, other prime movers А.С. 

and D.C. electric motors. 


8. Sales. Classified according to types of consumers. 
9. Research. Showing details of industrial research. 


Data under these heads are available industry-wise, State-wise 
and for all States taken together. The results of the ASI (Census Sector) 
are published in то volumes by the CSO and those of the ‘Sample 
Sector’ are published by the National Sample Survey Organisation in 
their regular reports. Volume I of the report gives the summaty tables 
together with an introduction and memorandum or definitions, concepts 
and procedures. The remaining 9 volumes give the detailed informa- 
tion relating to various industries and industrial groups- 


It will be noticed that this information is very much, similar to 
that collected under the SSMI. However, more detailed information 
is being collected about these items and information is also being collec- 
ted about a number of new items which were neither included in SSMI 
nor CMI. For the first time statistics relating to the following aspects 
are collected under ASI : 

(i) Skilled, semi-skilled and unskilled workers. 

(ii) Labour and management relations. 
(iii) Equipment other than power equipments installed. 
(iv) Training facilities given by the factories. 


(v) Installed capacity of production. 

(vi) Sales effected during the year classified according to the type 
of consumers. 

(vii) Industrial research. 

According to the provisional summary results of the Annual 
Survey of Industries for 1970* there were 13,598 (13,101)** registered 
factories employing 50 or more workers with the aid of power and 100 
or more workers without the aid of power, showing an increase of 3°79 
per cent over 1969. Of these, the factories which reported data 
numbered 13,279 (12,754) representing an increase of 4'11 per cent Over 
1969. With a productive capital of Rs. 11,1058 (Rs. 9,933 I) crores, 
these factories provided employment to 42/6 (ar'5) lakh persons distri- 
buting an annual wage bill of Rs. 1,5169 (Rs. т,341'0) crores. his 
represented an increase of 11:89 per cent in the productive capital, 2°60 
per cent in employment and 13°09 per cent in wage i 
of Rs. 7957'2 (Rs. 7010°9 crores, the factories g 
Rs. 11,3994 (Rs. 9.99172) crores. Asa result, the net value added to 
Lis TR oir Vance RUN 


*Source : India, 1976, р. 252-53. Í i 
** Figures in brackets are for 1969 and are given for comparison. 
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the national economy was Rs. 2,878:2 (Rs. 2,4675) crores, an increase 


of 13°26 per cent in the input, 14:09 per cent in the output and 16:64 
per cent in value. 


IV. Monthly Statistics of the Production of Selected Industries 


This is a monthly publication of the Central Statistical. Organisa- 
tion (Industrial Statistics Wing), Department of Statistics, Cabinet 
Secretariat, Government of India. It contains serial statistics of pros 
duction and stock of various industries on an all-India basis. The 
monthly production of the different industries is estimated from hi 
returns received from the occupiers of factories. The returns аг 
submitted voluntarily by the producers except in the case of coal, pusan 
jute mill machinery, salt, cotton textiles, woollen textiles, iron and steel, 


minerals except petroleum and gold where the collecting agencies have 
powers to call for returns. 


Scope and Coverage. The production statistics .relate to most 
major industries. In the case of well-organised industries like cotton 
and jute textiles, iron and steel, and Sugar they cover almost. the: entire 
Output. In the case of the remaining industries they cover production 


of all except small units for which the collection of monthly statistics 
is not practicable. 


Classification.. Industry has been classified according to e 
National Standard Industrial and Occupational Classification. The 
monthly statistics are classified into three divisions : 

(a) Mining and Quarrying. 

(b) Manufacturing. 

(c) Electricity, Gas, Water and Sanitary Servicés. 


Installed Capacity. No installed capacities have been given in 
cases like coal, other minerals, tea, salt and gold, etc. 


Production by States. The monthly statistics of production by 
States were introduced for the f 


rst time in November 1958 the issue 
covering 10 industries. The number of industries in this series now 
Stands at 79. 


i Index of Industrial Production. 
industrial production with 1970 as base. 


Stochs. Since June 1 
products at factories at the 


The journal gives the index of 


954 it also contains the statistics of finished 
end of month. 


Limitations of the Data. The utility of this publication is reduced 
to some extent on account of the fact that the d 


V. Monthly Abstract of Statistics 


This is a monthly publication of the Centra 


" 2 l Statistical -Organisa- 
tion, Department: of Statistics, Cabinet Secre 


tariat, Government of 
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India. Besides data on other aspects, it contains the following data 
pertaining to Industries : 


(i) Index Number of Industrial Production. 


(ii) Industrial Production—Food and Textile Manufacture, Wood, 
Cork and Paper Manufacture, Rubber tyres, Chemicals and chemical 
products, Petroleum products, Non-metallic mineral products, Basic 
metal industries, Metal products except machinery and transport equip- 
ment. 

(iii) Fuel and Power—Production and distribution Electricity 
production and distribution —Coal. 

(iv) Consumption and Stocks— Consumption ofraw materials and 
manufactures, Stock of raw materials and manufactured goods. 


VI. Statistical Abstract of the Indian Union 


This is an annual publication of the CSO. It contains statistics 
on various aspects including industry and power. All-India totals are 
provided in most of the tables for the latest five or ten years. In 
addition data at the beginning of the five-year Plan periods have been 
given to facilitate study of the magnitude of growth in the inter-Plan 
Periods. Detailed State-wise figures are given for the latest year for 
which they are available. 


Besides these publications containing information in respect of 
most of the industries, there ate specialized publications of particular 
industries also. For example, the Monthly Jute Bulletin published 
by the Indian Central Jute Committee, Calcutta, gives data relating to 
jute trade and industry, statistics of production and stocks, consump- 
tion and export of jute goods, etc. Similarly the Monthly Coal Bulletin 
issued by the Chief Inspector of Mines. Ministry of Labour and Em- 
ployment, Dhanbad, contains figures of production, despatches and 
stocks of coal, employment, wages. accidents, etc. In respect of Cotton 
Textiles Industry there is the Monthly Statistical Bulletin, Indian 
Cotton Textile Industry, published by the Textile Commissioner, 
Government of India. In respect of sugar industry there is a monthly 
publication entitled Indian Sugar published by the Sugar Manufacturers' 
Association. ы 


. Besides these primary sources, there are some secondary sources 
of industrial statistics also. Important amongst these аге: 


І. Report on Currency and Finance. 
IL. The Journal of Industry and Trade. 


І. Report on Currency and Finance 

This is anannual publication of the Reserve Bank of India. Besides 
data on other aspects, it gives statistics of output of major industries 
classified under the following heads: (а) Basic industries, (b) Capital 
goods industries, (c) Intermediate goods industries, and (d) Consumer 
goods industries. 
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П. The Journal of Industry and Trade 


It is a monthly publication of the Directorate of Commercial 
Publicity, Ministry of Commerce. It contains detailed statistics of 
industria] production classified under r4 groups of industries. It 
Contains quantitative data ánd not the value of production; monthly 
figures from March to April of the previous year are accompanied by 
the annual data for 1957 onwards. 


Statistics of Small-scale and Cottage Industries* 


Astudy ofthe industrial statistics of India will be incomplete 
without a reference to the statistics of small-scale and cottage indus- 
tries. The contribution of small industries to the national income of 
India is estimated at 8% as against ro % accounted for by large enter- 
prises. It is also estimated that 40% of the industrial output in India 
in the registered manufacturing sector can be attributed to small enter- 
prises. But as regards the availability of statistical data on small-scale 
industries the position is not satisfactory. Expressing their view in the 
Third Five-Year Plan, the Planning Commission observed : “Although 
surveys of a number of industries and specific areas have been carried 
out by different agencies and organisations in the past, basic statistical 
data for small-scale industries for the country as a whole, which are 
essential for making a quantitative assessment of the impact of the 
programmes and for drawing up new plans, are still lacking.” 


The only sources of statistics relating to small enterprises for the 
whole country are the occupational tables in the Census of India, data 
collected by the NSSO in its various rounds, reports on some surveys 
conducted under the auspices of Research Programmes Committee of 
the Planning Commission, aad the Agro-Economic Surveys sponsored 
by the Ministry of Agriculture and Irrigation. The NSSO in its 
various rounds has conducted some surveys оп household industry. The 
data collected in the 7th to 10th and 14th rounds related to total em- 
ployment, productivity, value added, etc. The results of these surveys 
have been published in the NSSO Reports, 19, 21. 42, 43, and 94 
which are entitled ‘Tables with Notes on Small-scale Manufacture— 
е апа Urban, Household Enterprises smaller than Registered 

actories." 


Besides the ad hoc surveys that ате conducted from time to time, 
regular data on certain aspects of small-scale industries can be obtained 
from the following sources : 


(i) Annual Report on Currency and Finance published by the 
Reserve Bank of India. This contains statistics relating to the number 
ofsmallenterprises assisted and the amount of financial assistance 
given by the State Bank of India and its subsidiaries, the role of the 
National Small Industries Corporation in providing assistance to small- 
scale industries, and number of applications received for guarantee, 
and the number and amount of guarantees sanctioned to small-scale 


* Cottage industries are distinguished from small-scale industries. The latter 
includes establishments which employ capital not exceeding Rs. ro lakhs irrespective 
of persons working therein. 
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ne under the “Credit Guarantee Scheme” of the Government 
о паа. 


_ Gi) Annual Report on the Working of the National Small Indus- 
tries Corporation. 
. iii) Annual Report on the Working of Small Industries Corpo- 
rations in different States, Ч 


3 Besides these, the Development Commissioner, Small-scale Indus- 
tries, Ministry of Industry, brings out the following publications : 


51. No. Title Periodicity 


Report on the Working of Central 


Small Industries Organisation Monthly 
2. Ре Haff-yearly 
3. —Do— Annual 
4 Small-scale Industries at a Glance Half-yearly 
5. Annual Report for the Organisa- 


tion of the Development Commis- 
sioner (Small-scale Industries) Annual 


Source: C.S.O. : Statistical System in India (1970), р. 20. 


Limitations of Industrial Statistics 


; It shall be clear from the description given here that it is only after 
independence that we have data worth the name on various aspects of 
industries. However, there is considerable scope for improvement of 
such data. The main limitations of industrial statistics are : 


т. There is a considerable delay in the publication of the findings 
of Annual Survey of Industries (а time-lgg of six years— 
the latest figures relate to the period 1970). This delay ought 
to be minimised as far as possible so that the data are really useful for 
purposes of analysis and interpretation. 


2. Asin CMI, in ASI also the concepts and definitions of vari- 
ous terms have been borrowed wholesale from the Factories Act and 
the Payment of Wages Act, etc. These definitions in many cases are 
vague and not suitable for industrial purposes and fresh definitions and 
concepts ought to be laid down for purposes of industrial statistics, 
For example. ‘Ex-factory’ value has not been satisfactorily defined. It 
isnot clear whether the values have to be calculated at *market price 
or ‘factory price’. Similarly the concept of intermediate products needs 


proper definition. 


3. In the case of unorganised sector, : 
cottage industries, there is hardly any data available. In view of the 
growing importance of this sector there isan urgent need for setting 
up а proper organisation for collection, compilation and publication of 
data in respect of unorganised sector. We do not have any reliable data 


regarding inputs and outputs, employment of workers of different types 


i.e, the small-scale and 
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and capital investment. Two suggestions can be made so as to improve 
the availability of statistics in respect of unorganised sector : (a) the 
Annual Survey of Industries should be extended so as to cover the 
unorganised sector as well, and (b) small-scale enterprises may be asked 
to submit information on a statutory basis. 

Index Number of Industrial Production ; 

Index number of industrial production is of immense use in mea- 
suring the economic progress of any country. The index indicates 
changes over time in the volume of non-agricultural commodities. It 
measures at regular intervals general movements in the volume of 
industrial output. The index is compiled both by official as well as non- 
official agencies. Here only the official index is discussed. 


, А number of indices of. industrial production have been construc- 
tedin India from ‘ime to time and have been discontinued in favour 
of better schemes. Their brief account is given below : 


The first official index of industrial production was computed by 
the office of the Economic Adviser, Ministry of Commerce and 
Industry, with the year 1937 as base. It covered mining, manufac- 
turing and electricity groups of industries and extended to 15 indus- 
tries in all. Weights were assigned according to the total value of 
output, and weighted arithmetic average was used. Adjustments were 
made for seasonal variations. The index was called ‘Interim Index 
of Industrial Production’. 


The new series of interim index of industrial production was 
Started from January 1949. The year 1946 was adopted as the base. 
The index covered 36 items of 20 industries, Figures of production 
were taken from the Monthly Statistics of Production of Selected Indus- 
tries of India, This series was discontinued from April 1956. А revi- 
sed index was published in October 1955, with 1951 asthe base year. 
The Index covered 88 items classified according to the International 
Standard Industrial Classification of all Economic Activities. 


The index with 1951 as base was replaced in July 1962 by another 
series with base 1956 and covering 201 items of industrial production. 
In order to make the index number of industrial production reflect 
adequately the recent industrial growth in the country, the series with 
1956 as base was replaced by a revised series with 1960 as base. This 
was done in accordance with the recommendations of an ad hoc working 
group set up by the Central Statistical Organisation. In June 1974, the 
index was again revised and its base was changed to 1970— 100. The 
index is obtained as а weighted arithmetic mean calculated by the 


formula : 
1T ZRIW 
(0 ZW. 
where I is the index, Ri the production relative for the ith item for 
the month in question and W; the weight of the items. 

The above formula gives the ‘crude index. The crude monthly 
general index is to be adjusted for seasonality by appropriate seasonal 
factors derived from the crude general index on the basis of 12 month 
moving average method. 
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Limitations of the Current Index. Despite many improvements 
that have been made from time to time in scope, coverage, base year 
technique, weighting diagram, etc., the current index suffers from the 
following limitations : 


1. The information in case of most of the industries is collected 
from large-scale units licensed under the Industries (Development and 
Regulation) Act, 1951, i.e., employing 50 or more workers and using 
power or employing roo or more workers and not using power. It leaves 
out a very large and substantial portion of industrial production from 
the scope of the index. 


2. The production in case of those units which do not submit the 
return is estimated which introduces errors of judgment. 


_ 3- The weighting diagram is based on information obtained from 
a wide variety of sources and is not uniform qualitatively. 


TRY YOURSELF 
2 1, Write a critical note on the nature, scope and quality of industrial statistics 
in India. 

2. If you are asked to construct index numbers of industrial production in 
India, what basic data would you require ? Comment on the sources and availability 
ofdata. Describe the method you would follow in the construction of these index 
numbers. 

3. Write a note on the Annual Survey of Industries—scope and coverage. 

З 4. Write a critial note on the nature, scope and limitations of statistics rela- 
ting to the manufacturing industries of India. 

5. Write a short note on the Index Number of Industrial Production in 
India. 

6. Write a critical note on the nature and scope of the statistics of industrial 
production available in India at present. 

7. Briefly discuss the nature of industrial statistics available in the Annual 
Survey of Industries. as Tih é 

8. What are the important sources of industrial statistics in India ? 

9. What does an index of industrial production attempt to measure? Describe 
the scope and method of construction of the current official index of industrial pro- 
duction, What are its limitations, if any, and what steps would you suggest for its 
improvement. 

10 Write a detailed note on the nature, scope and sources of Industrial 


Statistics in India. (B. Com., Delhi, 1971) 
її. What are the sources of industrial statistics of India ? Are they 
adequate ? (M.A. Econ., Jabalpur, 1975) 


12, State the scope of the ‘Annual Survey of Industries’ conducted 
by the Government of India and throw light on the methodology adopted for 
the purpose. (М.А. Econ., Jabalpur, 1975) 
К 13. List the sources of official statistics relating to manufacturing industries 
in India separately for the large-scale and small scale sectors. Comment on their 
Coverage. (B.A. Hons. Econ. Delhi, 1976) 


Section 


4 Trade Statistics 


PO 


Nature 

Trade statistics which include both the inland trade, statistics as 
wellas the foreign trade statistics are of immense use to the busi- 
nessmen, traders, agriculturists, manufacturers, research workers and 
the Government, These statistics, if properly collected and maintained; 
show the volume and value of trade carried out within the country and 
also with other countries. Detailed figures of country-wise and 
commodity-wise trade throw light on the trends of trade in the economy 
and enable the industrialists and the Government to formulate their 
long and short-term policies with regard to manufacture ofcommodities. 
The regional taxation policy of the Government also depends to a very 
large extent on the well-maintained trade statistics. 


Historical Development 


Trade in India with other countries of the world has been carried 
out since ancient times. However, the history of our trade statistics 
can be traced to the establishment of the East India Company which 
had virtually obtained a monopoly of our foreign trade. The. Company 

- had maintained certain accounts for submission to Government autho- 
rities. Thus certain estimates relating to the volume and direction of our 
external trade emerged asa by-product of administrative activity. 
In 1869 the Suez Canal was thrown open for navigation. This resulted 
is an enormous increase in our exports and imports and statistics began 
to emerge as а result of administration of laws relating to the taxation 
of goods entering or leaving the country. However, the real develop- 
ment in the field of trade statistics took place in the year 1905 when the 
Department of Commercial Intelligence and Statistics was established. 
One of the main functions of this department was to collect commer- 
cial statistics and to help trade and to work as intermediary between 
India and foreign businessmen. In the year 1906, the department 
brought out the first issue of the Indian Trade Journal. 


Trade statistics of India have been in quite a satisfactory state 
of affairs from tbe very beginning. 


Over a period of time these statistics have improved considerably 
both in scope and coverage. For the sake of simplicity trade statistics 
are discussed under the following heads : 


I. Statistics of Inland Trade. 
II. Statistics of Foreign Trade. 
ІІ. Index Numbers of Imports and Exports, 
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I. Statistics of Inland Trade 
Inland trade statistics can be sub-divided under two heads : 


(A) Wholesale Trade Statistics. 
(B) Retail Trade Statistics. 


^ (A) Wholsale Trade Statistics. Broadly speaking, the entire 
inland wholesale trade of the country may be divided into the following 
three categories for which statistical data is separately available : 

И (i) wholesale trade of one State with another State called inter-State 
trade. 

ы (ii) wholesale trade of one port with another port called inter-port 
trade. 

(iii) wholesale trade of one State with another port. 


Most of the trade in India is carried out by railways. However, 
roads are fast developingas ап important competitor of rail traffic. 
Bullock-cart is also quite popular in villages as a carrier of goods from 
one place to another. The trade between different States or 
regions of India may be conveniently classified under the following 
sub-heads : 

1. Rail and River-borne trade. 

2. Road-borne trade. 

3. Air-borne trade. 

4. Coasting trade. 

А brief description of each of these is given below : 


т. RailandRiver-borne Trade. Figures of inland rail and 
river trade are available in the official quarterly (previously monthly) 
Journal Accounts Relating to the Inland (Rail and River-borne) Trade of 
India. This journal is published by the Department of Commercial 
Intelligence and Statistics, Calcutta. Com iled statistics are obtained 
from railway and steamer authorities to whom traders have to submit 
their invoices. In this publication trade items are classified into 30 main 
groups divided. into 67 items. The publication contains only such 
quantities of commodities аз аге declared by railways and steamer 
agencies. Quantity statistics are taken from invoices and do not in- 
clude packing and coverage. In addition to merchandise, the publication 
also provides figures regarding movement of treasure in the country. 

Rail and river-borne trade statistics suffer from the following 
defects : 

(i) Limited coverage. All commodities are covered. Even in the 
figures of river-borne trade only two well-known companies are taken 
into account and others are left out. These companies are ‘Indian 
General Navigation and Railway Company’ and River Steam Navigation 
Company Ltd.’ 

(ii) Absence of value figures. Figures of the value of traffic 
handled are not given. 

(iii) Trade by boat is not included. 
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2. Road-borne Trade (carried by trucks and other country 
crafts). In recent years road truck service has emerged as a serious 
competitor of the railways. However, it is unfortunate to find that no 
statistics worth the name are available regarding goodscarried by 
lorries from one place to another. The same is true of trade carried by 
other means such as bullock-carts, donkeys and other animals, though 
at places these country crafts and agencies play a vital role in the 
movement of goods traffic. In view of the growing importance of the 
road-borne trade there is an urgent need for the availability of 
authentic statistics, 


Air-borne Trade. With regard to air-borne trade also the 
position is not very satisfactory. Though the entire air transport is 
organised and regulated by the Government of India, yet no statistics 
other than that of freight are being collected by the Directorate of 
Civil Aviation. The only available statistics relate to total monthly 
freight carried, recorded separately for each airport under two heads : 
(i) Off loaded, and (її) On-loaded. The main gaps in these statistics 
are that the commodity-wise and region-wise breakdown of movements 
are not available. 


4. Coasting Trade. Statistics relating to coasting trade of 
India are published in a monthly bulletin entitled Statistics of Foreign 
and Coasting Cargo Movements of India and to the annual publication 
Statistics of Coastal Trade of India published by the Department of 
Commercial Intelligence апа Statistics. Statistical tables relating to 
navigation in coasting trade of India are now available in another 
official publication of the same Department entitled Statistics of 
Maritime Navigation of India. Statistics in the above publications are 
based on monthly returns submitted to the DGCIS by the customs 
authorities in India. Monthly returns are based on shipping documents, 
i.e; bills of entry and shipping bills submitted by- traders, Coasting 
trade is registered separately from trade in the customs houses. For 
the registration of coastal trade the Indian coast has been divided into 
twelve maritime blocks since April 1963. 


Statistics of coasting trade are recorded under two broad heads : 


_ (i) Internal Trade, i.e., trade amongst the ports within the same 
maritime block, and 


(ii) external trade, i.e., trade between one maritime blocks on the 
one hand and all other maritime blocks on the other. 


. (B) Retail Trade Statistics, With regard to retail trade 
Statistics, the condition is much worse. Practically such statistics do 
not exist for making any long-term analysis. There is no single publi- 
cation reserved exclusively for this purpose. The Indian Trade Journal—a 
weekly publication of the Department of Commercial Intelligence and 
Statistics—contains some statistics on the retail trade. Also such statis- 
tics are available in weekly newspapers and reports from trade centres. 
However, they are barely inadequate, 
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Defects of Inland Trade Statistics 
и Inland trade statisticsare far from complete and need considerable 
improvement. The important defects are : 

. (i) Inland trade statistics do not provide value figures of the com- 
modities traded. 

. (ii) Available statistics do not include goods transported by road 
using trucks, bullock-carts and animals like camel, donkey, etc., 
although they form a considerable proportion of total commodity trade 
in the country. 

(iii) Coverage commodity-wise is not complete. 

(iv) River-borne statistics do not cover all trade blocks but are 
confined to a few in Northern India. 

(v) Coasting trade statistics do not give the sources and destinations 
of exports and imports from each trade block. Thus the inter-State and 
inter-regional balance of trade cannot be determined. This is important 
from the viewpoint of studying the development of inland trade. 

(vi) There are hardly any statistics about retail trade. 

(vii) There is hardly any information available regarding the 
movement of goods by road and air. 

, There are three main factors responsible for the inadequacy of 
our internal trade statistics . 

(i) The existence of a large number of middlemen in our trade 
organisation. 

(ii) A substantial amount of trade in India is still carried out on 
the barter system, making it difficult to estimate the value involved. 

(iii) A wide variety of transport services from the pedlar to the 
aeroplane is in vogue in India and the task of keeping a check on their 
movement is almost impossible. 


II. Foreign Trade Statistics 


Compared to the inland trade statistics, the position is much more 
satisfactory with regard to the availability of the foreign trade statistics. 
The following are the important official publications on foreign trade 


1. Monthly Statistics of the Foreign Trade of India, Volume I 


(Exports and Re-exports) 
2. Monthly Statistics of the Foreign Trade of India, Volume II 


(Imports) 
3. Supplement to the Monthly Statistics of the Foreign Trade of India 


Volume I (Quarterly). : 
. Supplement to the Monthly Statistics of the Foreign Trade of 


Had Volume II (Quarterly Issue). 
5. Annual Statistics of Foreign Trade of India by Countries. 
6. Annual Statistics of Foreign Trade of India by Customs Zones- 
7. Customs and Excise Revenue Statement of Indian Union. 
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8. Annual Supplement to the Foerign Trade of India. 

9. Statistics of the Maritime Navigation of India (Annual) 

то. The Journal of Industry and Trade (Monthly). 

т. Monthly Statistics of Foreign Trade of India, Vol. I—Ex- 
ports and Re-Exports. This is a monthly publication of the Govern- 
ment of India, Department of Commercial Intelligence and Statistics, 
Calcutta. 

Scope of the publication. The statistics contained in this publication 
relate to exports and re-exports of merchandise from all the sea-ports, 
airports and customs stations of the country. 

Data on postal exports are shown without commodity details. 
Commodity country details of such exports are published in the Supple- 
ment to the Monthly Statistics of the Foreign Trade of India— Vol. I. 

Coverage. The following items of trade are included : (а) silver 
(other than current coins), notes and coins withdrawn from circulation 
or not yet issued, (b) indirect transit trade, and (c) exports by parcel 
post and letter post. 

The following categories of trade are excluded : (a) direct transit 
trade, (b) transhipment trade, (c) passengers' baggage, end (d) ship's 
stores. 

Source of data. The statistics are based on declarations made by 
exporters as subsequently checked by customs officials. 

Commodity classification. Commodities are classified according to 
the Revised Indian Trade Classification (1965) except in the case of pet- 
roleum products and ‘‘prescribed substances" under the Atomic Energy 
Act, 1961. With effect from the issue for April, 1963, the classification 
has been amplified. 

The journal contains summary tables as well as detailed tables. 
Summary Tables relate to the following : 

Table т. India's Exports and Re-exports by Sections, Divisions 
and Groups. 

Table 2. India’s Exports and Re-exports with each specified 
country. 

Detailed tables relate to : 

Table 3. India’s Exports by Commodities-countries. 

Table 4. India’s Re-Exports by Commodities-countries. 

2. Monthly Statistics of the Foreign Trade of India, Vol. 
Il—Imports. This volume is also issued by the Department of Com- 
mercial Intelligence and Statistics, Calcutta. 

Scope of the publication. The statistics relate to imports of 
merchandise into all the sea-ports, airports and land customs of the 
country. 

Coverage. The following items of trade are included : (a) silver 
(other than current coins), notes and coins withdrawn from circulation 
or not yet issued, (b) indirect transit trade, (c) imports by parcel post, 
(d) dutiable articles by letter post, and (e) defence stores. 
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The following categories of trade are excluded : (а) direct transit 
trade, (b) transhipment trade, and (c) passengers' baggage. 

} Source of data. The statistics are based on declarations made by 
importers as subsequently checked by customs officials. 

This journal also contains Summary tables and Detailed tables. 
Summary tables relate to : 

Table. 1. India’s Import Trade by Sections, Divisions and 
Groups. { 

ТаЫе 2. India's Import Trade with each specified country. 

Detailed table relates to : 

Table 3. India's Imports by Commodities-countries. 

3. Supplement to the Monthly Statistics of the Foreign 
Trade of India, Vol. І. This is a quarterly publication of the Direc- 
torate of Commercial Intelligence and Statistics, Calcutta. 

It contains the following eleven tables : 

(a) Value of foreign trade. 

(b) Overall balance of trade. 

(c) Foreign trade of customs zones. 

(d) Foreign trade in categories of commodities with each economic 

region. 

(e) Foreign trade of neighbouring countries passing through India 

(Transit Trade). 
(7) Index Numbers. 

(g) Quantity and value of principal articles of export. 

(h) Quantity and value of principal articles of import. 

(i) Articles warehoused under bond. 

(j) Foreign trade in treasure. 

(k) Quantity and value of principal articles exported by post to 

foreign countries. 

4. Supplement to the Monthly Statistics of the Foreign 
Trade of India, Vol. И. Supplement to the Monthly Statistics of the 
Foreign Trade of India, Vol. П (quarterly) showing statistics of trade 
with the different countries by broad categories of goods in terms of 
value was being published till March, 1962. The publication was sus- 
pended thereafter as an emergency measure. Because of the increasing 
demand for these statistics, the publication was again revived from 
December, 1964. The contents of this publication are : 

(1) Summary of India's foreign trade by countries and Economic 
Regions. 

(ii) Table showing India’s trade in principal. commodities by 
countries : 

(a) Africa. 

(b) North America. 

(c) Latin America. 
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(d) Other American countries. 


(е) ECAFE countries. 
(f£) Other Asian and Oceania countries and Antarctic and Arctic 
Regions. 
(g) East European countries. 
(h) European Common Market countries. 
(i) European Free Trade Area. 
(j) Other European countries. 


5. AnnualStatistics of Foreign Trade of India by Countries. 
This annual issue shows in a revised and verified form the monthly 
totals of foreign trade published in monthly issues. Revision and 
correction of monthly figures arise due to short shipments in exports 
and finalisation of ‘note pass’ transactions in imports. 

6. Annual Statistics of Foreign Trade of india by Customs 
Zones. This is a bilingual (Hindi-English) publication which provides 
annual statistical data regarding India’s foreign trade for all the eight 
zones. i.e., Calcutta, Madras, Cochin, Bombay, Delhi, Patna, Baroda 
and Shillong. The first issue was published for the calendar year 1957 
and the second issue relates to the calendar years 1958 and 1959. It has 
been decided to merge this publication with the regular annual publica- 
tions. 

7. Customs and Excise Revenue Statements of the Indian 
Union. This is a monthly publication which provides details of revenue 
accruing from the levy of customs and Union excise duties, cesses, etc. 
'These statistics are based on the monthly returns submitted to the 
DGCIS by customs and Central excise authorities. 

8. Annual Supplement to the Foreign Trade of India. This 
annual issue shows in a revised and verified form the monthly totals of 
foreign trade published in monthly issues. Revision and correction of 
monthly figures due to short and shut-out shipments in exports and 
finalisation of ‘note pass’ transactions in imports. The publication 
provides detailed annual statistical information relating to the following 
aspects : 

(i) Total trade with each specified country. 
(ii) Commodity-wise figures of exports, re-exports and imports 
separately for each country. 

(iii) Exports of bunker fuel. 

(iv) Outward and inward transit trade. 


. 9. Statistics of the Maritime Navigation of India (Annual). 
This publication was started in Janunary 1957 after the adoption of new 
trade classification. It provides statistical information regarding ship- 
ping trade of India. Statistics in this publication are based on monthly 
and annual returns received from customs 'and Central authorities at 
sea-ports. Indian ѕеа-рогіѕ have been grouped into five customs zones 
for the purpose of recording shipping statistics. These zones are: 
West Bengal, Madras, Cochin, Bombay and Baroda. 
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Statistical information contained in this publication has been 
divided into four parts, namely : 

Part I—Shipping in Foreign Trade. 

Part II —Shipping in Coasting Trade. 

Part III— Tonnage of cargo handled at major ports of India. 

Part IV— Vessels built and registered. 

The latest issue of this publication relates to the year 1965-66. 


ло. The Journal of Industry and Trade. This is monthly 
publication of the Directorate of Commercial Publicity, Ministry of 
Commerce, Government of India, New Delhi. 


The statistical section of the Journal is divided under two parts : 
(i) Industrial Production, and 

(ii) India's Foreign Trade. 

The Foreign T rade section contains the following tables supplying 
information on different aspects : 

(i) Overall Balance of Trade. 
(ii) Exports of principal Items. 

(iii) India's Imports of Principal Commodities. 

(iv) Direction of India's Export Trade. 

(v) Direction of Trade. 

Besides publications giving information exclusively on internal 
trade or external trade, there are certain publications which contain 
some data both regarding internal as well as foreign trade. These are: 

(A) Statistical Abstract of the Indian Union (Annual). 

(B) The Indian Trade Journal (Weekly). 


A brief description of these is given below : 

А (A) Statistical Abstract of the Indian Union ( Annual). "This 
is an annual publication of the CSO. In this publication the statistics 
of trade are classified under four groups : 

(i) Foreign trade by sea and air. 

(ii) Coastal trade. 

(iii) Inland trade. 

(iv) Land frontiers trade. 

(B) The Indian Trade Journal. The Indian. Trade Journal is 
the weekly organ of the Department of Commercial Intelligence and 
Statistics, Government of India, Calcutta. Though not an exclusive 
Statistical publication, it provides data on the following aspects of 
trade : 

(i) Weekly figures of exports and imports of selected commodities 
—data being collected from shipping bills. 

(ii) Weekly despatches and arrivals of certain staple commodities 
at selected centres. Figures published are taken from reports which 
are given by railways and steamer authorities. 
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(iii) Monthly foreign sea-borne trade of India. Itis in a sum- 
mary form giving figures of exports and imports, private merchandise 
and visible items of trade. 


Besides these publications, the Reserve Bank of India Bulletin con- 
tains the following trade statistics : 


(i) India's Overall Balance of Payments— Current Account. 
(ii) India's Overall Balance of Payments —Capital Account. 


(iii) Foreign Trade giving separately the figures of (a) Merchan- 
dise, and (b) Treasure (gold). These are again classified into (i) Imports, 
(ii) Exports, and (iii) Balance of trade. 


These statistics are reproduced from the Supplement of Monthly 
Statistics of the Foreign Trade of India. 


... The January issue of the Reserve Bank of India Bulletin also con- 
tains the indices of imports and exports. 


Ш. мыя numbers of Quantum апа Unit Value of Exports and 

mports 

Fides numbers of foreign trade of India are compiled by 
the Department of Commercial Intelligence and Statistics, Calcutta. 
Separate indices are available for imports and exports and for each 
category both the unit value as well as the volume indices are available. 
Annual general index number of imports and exports is also published. 
These indices are published in the Supplement to Monthly Statistics 
of Foreign Trade of India which is a quarterly publication of DGCIS. 
A brief description of the index is given below : 


Base year. Formerly the base of index number of imports and 
exports was the financial year 1948-49. This was shifted later to 1952- 
53 and then to the year 1958. Recently the year 1970 has been taken 
as the base. The changes in base year were necessary because of 
changes in the pattern of India's foreign tradeand the system of classi- 
fication of the items. From January 1957, the old system of classifica- 
tion of the items under three main heads, namely : (i) food, drink and 
tobacco ; (ii) raw materials ; and (їй) manufactured articles, has been 
replaced by a more elaborate classification. The items covered in the 
indices are sub-classified into nine heads as follows : 


Food. 

Beverages and tobacco. 

Crude materials, inedible, except fuels. 

Mineral fuels and lubricants, etc. 

Animal and vegetable oils and fats, etc. 
Chemicals. 

Manufactured goods classified chiefly by materials. 
Machinery and transport equipment. 
Miscellaneous manufactured articles. 


The index covers all foreign trade—sea, ‘air andland. In the 
construction of index numbers re-exports have been excluded from 
exports. Separate indices are available not only for each of the groups 
given above but also for each item included in a group. 


о боз ON de NH 
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With the adoption of the new base year in accordance with new 
trade classification the revised series has become more representative. 
The coverage of import items is now 84 per cent of total value of 
imports of all items as against 70 per cent in the old series. 511 items 
of import and 317 items of export are taken into account in calcula- 
tion of the present series. 

Method of construction. The unit value index numbers of export 
and import are prepared by the aggregate weighted average method in 
which current quantity figures are used as weights. The quantity index 
is prepared by the method in which the value figures of base period are 
used as weights. 

The unit value (price) Index numbers are ca 
Paasche’s formula, i.e., 


Iculated according to 


where P, and Pa denote the value of an article in base period and the 
current period respectively, and Ол its quantities in the current period. 
The quantum index is indirectly calculated from the value ratio 
and price index of unit value index numbers. 
The Quantum Indices are derived from the 
(P) as follows : 


Unit Value Indices 


where, Мо and №, are the total values of trade in the base and current 
periods respectively. 

One fundamental defect of these index numbers is’ that they do 
not include in their construction all items but include only representa- 
tive items. However, adjustments made in the final index numbers 
allow for incomplete coverage. The adjustments are based on the 
assumption tbat the price changes in the items not covered in a section 
are similar to those shown by the items included in the calculation for 


the section. 
TRY YOURSELF 
.. I. Point out the utility of Trade Statistics. Narrate the various publications 
giving information about the trade of India. (M.A. Econ., Jabalpur, 1974) 
p ‚2. Examine the nature and limitations of statistics relating to foreign trade 
in India. (M.A. Ecom., Punjab, 1972 ; M.A. Ecom., Jabalpur, 1975) 
3. State the principal sources of statistics relating to the Inland Trade of 


India and discuss their reliability. (B. Com., Punjab, 1973) 
4. Examine the nature, scope and limitations of Trade Statistics of Indie. 

B. Com., Kerala, 1973) 

cial statistics relating 


z. Mention the sources and describe the nature of offi < 
(В. Сот., Delhi, 1975) 


to external trade of India. 


Section 


5 Labour Statistics 


Nature 

The term ‘labour’ has a very wide connotation as it includes all 
types of workers engaged in trade, industry, commerce and agriculture. 
When we talk of labour statistics, we mean thereby factual data on 
various aspects of labour such as employment and unemployment, 
division into skilled, unskilled and semi-skilled workers, wages and 
salaries paid to them, hours of work, data on trade unions, industrial 
injuries, labour absenteeism, etc. Obviously, such statistics аге indis- 
pensable in framing suitable labour policies and in providing a basis for 
the appraisal of current labour problems. Labour statistics serve аза 
barometer of labour conditions prevailing in a country and also the 
Iteps taken by the Government and employers towards labour welfare. 
1n fact, labour statistics and labour research have been widely used in 
Secent years for overall economic planning. 

The International Labour Organisation (ILO) is playing a very 
important role in making available uniform set of data on labour for 
all nations so that international comparisons might be facilitated. It 
has laid down the following broad classifications within which the 
labour statistics should be compiled in every country : 

1. Classification of labour industry-wise and occupation-wise. 
Statistics of employment and unemployment. 

Wages and hours of work. 

Levels of living. 

Family living. 

Statistics of injuries and occupational diseases. 
Statistics of trade unions, industrial disputes, etc. 
Social security. 

International comparison of real wages. 

10. Migration, 


Crt anne p 


Historical Development 


Labour statistics in India are of much recent origin. It was in 
the year 1931 that the need for the systematic collection of labour 
statistics was pointed out by the Royal Commission on Labour. The 
commission pointed out very clearly that it was on facts that the policy 
must be based and so long as there was uncertainty as to the facts, 
‘there must be confusion and conflict regarding the aim. The commis- 
ssion recommended the adoption of suitable legislation enabling compe- 
tent authority to collect and collate information regarding the living 
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working and economic conditions of industrial labour. The recom- 
mendation was given effect to only in 1942, when the Industrial 
Statistics Act, 1942, was passed to facilitate collection of statistics on 
(a) matters relating to factories, and (b) certain specified areas of 
welfare and conditions of labour. Arrangements were also made, 
within the resources available, for processing the data flowing from 
the Trade Unions Act, 1926 ; the Factories Act, 1934; the Payment of 
Wages Act, 1936 ; and the like. 


It is clear from the above account that it was largely a by- product 
of the administration of various labour laws that some statistics on 
different aspects of labour emerged. However, they were not adequate 
and were of very little use in framing suitable labour policies, In the 
year 1946, a Labour Bureau was set up under the Ministry of Labour 
and Employment. The establishment of this bureau constitutes a land- 
mark in the history of development of labour statistics in our country. 
The activities of the Labour Bureau were enlarged in the yeats follow- 
ing its inception. Its current functions are listed below : 


т. Collection, compilation and publication of labour statistics on 
an all-India basis ; 

. 2. Construction and maintenance of working class consumer 
price index numbers for selected centres and all-India series of con- 
sumer Price Index Numbers ; 

3. Construction of consumer price index numbers for agricultural 
workers : 

.. 4 Keeping up-to-date the factual data relating to working con- 
ditions of industrial workers collected by the Labour Investigation 
Committee ; 

5, Conducting research in specified problems with a view to 
supplying data required for the formulation of labour policy ; 

6. Bringing out pamphlets and brochures on various aspects of 
Labour Legislation ; and 

7. Publication of ‘Indian Labour Journal’ (Monthly). ‘Indian 
Labour Statistics’ (Annual) and ‘Indian Labour Year Book’ (Annual) 
giving authoritative and up-to-date statistics and description of labour 
affairs in the country. 

The Labour Bureau collects its data in three stages : (i) the primary 
data are collected by State Governments or agencies of the Central 
Government, (ii) the returns received from individual units are consoli- 
dated at the State level or agency level according to the standard laid 
down by the Labour Bureau; and (iii) the returns received by the 
Labour Bureau are then consolidated and published in the form of all- 
India statistics. 


Available Industrial Labour Statistics 

The labour statistics available from various official sources are 
of a large variety such as statistics of employment and unemployment, 
wages and earnings, industrial disputes, trade unions, etc. A brief des- 
cription of these statistics is given below : 

1. Statistics of Employment and Unemployment. Data 
relating to employment are compiled by different Labour Bureaux. 
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The fo llowing figures relating to employment are available : E 
(i) Average daily employment in factories. 

(ii) Average daily employment in mines, 

(iii) Average daily employment in plantations. 

(iv) Employment in Railways and Posts and Telegraphs. 
(v) Employment in shops and commercial establishments. 
(vi) Employment in Central Government establishments. 


The statistics of employment do not cover the whole of agricul- | 
tural sector, the cottage and the household industries sector, construc- | 
tion sector, commerce, transport (excluding Railways) services, etc. 


With regard to employment statistics, there are three important 
sources, namely : 


S 


(i) Population Census. 
(ii) National Sample Survey. 2 
(iii) Employment Exchanges, 


The decennial population census constitutes the main source of | 
information for statistics of employment. The census provides data - 
on working population and on its distribution by industry and by ~ 
occupation. The concepts, definitions and classifications adopted have 
been changing from one census to another. Another survey of infor- 
mation on employment is the National Sample Survey which is collect- 
ing data on labour force, employment, unemployment and visible 
under-employment in the various rounds. 

The Emyloyment Exchange Statistics contain figures relating to © 
the number of persons seeking work at the end of each month, their 
classification in broad occupational groups, number of applicants placed 
in employment, number of vacancies noticed and the number of рег- 
sons trained in various centres. 


_ . However, these statistics suffer from a number of limitations, 
First, the existing employment exchanges do not cover the whole of - 
urban India. Secondly, many of the registrants are employed but they 
continue their names on the live registers in order to improve their 
Prospects. Thirdly, all the unemployed persons do not register them- 
selves with employment exchanges. "Lastly, there are registrants from 
the rural areas and their position in total registration is not known. 
Because of these limitations, the statistics of employment exchanges 
cannot be taken to reflect the trend in urban employment. 


,2. Statistics of Wages and Earnings. Statistics of wages and 
ne Serve as an indicator of the economic prosperity ofthat section - 
of the population which is in paid employment. The statistics on 
wages are collected under the Payment of Wages Act, 1936, and the 
3 inimum Wages Act, 1948. Tbe Labour Bureau has been collecting 

ata relating to wages and earnings under the following heads : 
(i) Manufacturing industries, 
(ii) Mines. 


(iii) Plantations, 
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(iv) Transport. 
(v) General Government Employees. 
(vi) Employees covered under the Minimum Wages Act, 1948. 


i The Labour Bureau also compiles and publishes an All-India 
m im Number of Real Earnings of Factory Workers with the year 1951 
e. 


Ind Wage statistics are also collected under the Annual Survey of 
ps ustries wherein the definition of ‘Wage’ has been adopted from the 
ed of Wages Act. This definition is narrow in application and 
does not meet the present requirements. The new term ‘Wage’ 
includes all contractual payments but excludes all extra payments like 
profit-sharing, etc. 


3. Statistics of Social Security. Social Security services are an 
essential part of services which are being provided to the labour. 
Data. on social security services is extremely useful in framing suitable 
policies in respect of social security and also in the proper evaluation 
of social security services. The following data on social security 18 
compiled and published through official agencies : 


(i) Statistics of cash and other benefits. Figures are compiled in 
respect of cash and other benefits given under the Employees State 
Insurance Act of 1948. The Act applies to persons getting less than 
Rs. soo per month and employed in all perennial factories run with 
power and employing 20 or more persons. Since November 30, 1975, 
limit of Rs. zoo has been raised to cover employees getting up to 
Rs. 1,000, Benefits are classified under the following categories : 
(i) Sickness, (ii) Maternity, (iii) Disablement, and (iv) Dependants. 

_ Data are also available in respect of employees provident fund, 
mines provident fund and bonus in separate tables. 


(ii) Statistics of maternity benefits. The data collected under this 
head shows : 

(a) the average number of women employed, 

(b) the number of women claiming maternity benefits, 

(c) the number of women who were paid the benefit, and 


(d) total amount paid. 
Figures are given separately for factories and mines. 


(iit) Statistics of compensated injuries. Figures are compiled 
under the Workmen's Compensation Act of 1923. The figures show: 
() the number of injuries for which compensation was paid, and 
(i) the amount of vompensation. These two items are further sub- 
classified into death, permanent disablement and temporary disable- 
ment. 

4. Statistics of Industrial Disputes. The following data per- 
taining to industrial disputes are available : (a) number of disputes, 
(b) number of workers involved, (c) number of man-days lost and (d) 


frequency. 
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Figures are compiled in respect of those disputes which result 
in the stoppage of work and involve at least ten workers. The cover- 
age of these statistics is very limited as figures are compiled on a purely 
voluntary basis. 


An index number of Industrial Unrest for manufacturing sector 
(Base 1951=100) is also given. In another table a percentage distri- 
bution of number of disputes according to causes is given. Various 
causes leading to disputes are classified into five broad categories, 
namely, (a) Wages and allowances, (b) Bonus, (c) Personnel and Re- 
trenchment, (d) Leave and hours of work, and (e) Others. 


Disputes are also classified according to results as follows : 
да (b) Partially successful. (c) Unsuccessful, and (d) In- 
efinite. 


5. Statistics of Hours of Work. Statistics of manhours 
oflabour input are needed for use in analysing labour costs, pro- 
ductivity, etc. They are also needed to regulate hours of work of 
different categories of workers to provide compulsory rest intervals 
and overtime wages. In India, the hours of work are regulated for 
workers engaged in organised sectors under various Acts, i.e. (i) The 
Factories Act, 1948 ; (ii) the Mines Act, 1952 ; (iii) the Plantations 
Labour Act, 1951; etc. These Acts specify the overtime in a day and 
іп a week and weekly off, etc. The statistics of total man-hours work- 
ed during the year in each industry are published in the ‘Annual 
Survey of Industries’. For the unorganised sector no data on hours of 
actual work are available except from the NSS where data are collected 
in the household surveys on hours worked and hours available for 
work in respect of gainfully employed persons for broad occupational 
groups. 


6. Statistics of Labour Absenteeism and Turnover, Absentee- 
ism with Indian labour is a chronic problem, It is measured by the 
percentage of man-shifts lost due to absence to the Corresponding total 
man-shifts scheduled to work Generally absence on account of 
authorised leave js included whereas absence due to strikes and lock- 
outs is excluded. The definitions and methods followed presently 
by the different agencies are not uniform. Apart from the statistics 
which are collected statutorily from all the coal-mines covered by the 
Mines Act, the other series of statistics of absenteeism are collected 
only ona voluntary basis and are furnished by a few selected large 
units. Nothing is done about non-response and the absenteeism rate 
is calculated from the available returns which sometimes cause 
sPutlous variations in the trends of absenteeism rate. 


Statistics of labour turnover measure the extent to which the old 
employees leave and new employees join the services of organisation in 
a given period. Serjal statistics relating to labour turnover are pre- 
sently available in respect of cotton textile industry in the States of 
Gujarat and Maharashtra, and are published in the Indian Labour 
Statistics. The statistics relate to all employees excluding clerks but 
including the large force of badlis employed in the industry to meet the 
high percentage of absenteeism among the permanent workers. 
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7. Statistics relating to Trade Unions. Detailed information 
relating to labour trade unions is compiled and published by the 
Government of India, Ministry of Labour, under the Indian Trade 
Unions Act of 1926. The registration of trade unions is not obligatory 
under the Act and as such the scope and coverage of these statistics 
is limited to those unions which are registered—unregistered unions 
are not covered. In case of registered unions also about one-half of 
them fail to submit the required returns to the State Governments. 
Hence there is no uniformity in the coverage of these figures, and no 
comparison is possible under the circumstances. Also itis difficult to 
say as to how far the published data reveal a true picture of trade 
unions in the country. 


Trade unions in India are of two types: (1) those formed by 
employees, known as Workers’ Unions, and (2) those formed by 
employers, known as Employers’ Union. Figures relating to employers’ 
unions and workers’ unions are given industry-wise and State-wise in 
separate tables. 

8 Statistics relating to Standard of Living. Stasistics per- 
taining to standard of living are collected in the form of family budget 
enquiries and retail prices of articles of consumption by labourers. The 
Labour Bureau and the State Governments conducted family budget 
enquiries for 18 centres and for workers in rice, tea, coffee and other 
organised industries. These family budget enquiries were conducted 
at different periods and they form the basis of the construction of the 
working cost of living indices. The data compiled give details for each 
centre and category of workers, the data of enquiry, the average size of 
the family, consumption units per family, average monthly income and 
expenditure, and the percentage of expenditure on food, 


Organisations Concerned with Collection and Publication of 
Labour Statistics 
At present the main organisations concerned with collection and 
publication of labour statistics at the Centre аге: 
I. Labour Bureau, Simla. 
IL. Directorate General of Employment and Training, New Delhi. 
HI. Directorate General of Mines Safety, Dhanbad. 
IV. Central Provident Fund Commissioner, 
V. Employees State Insurance Corporation. 


Labour Bureau. The Labour Bureau, ‘Ministry of Labour, 
inues to be the most important organ for 


Government of India, conti } о 
supplying labour statistics on an all-India basis. The Labour Bureau 


issues a large number of publications, some of them are regular whereas 
others ad hoc studies. The important regular publications of the Labour 


Bureau are : 
1. Indian Labour Journal (Monthly). 
2. Indian Labour Year Book (Annual). 
3. Indian Labour Statistics (Annual). 
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4. Trade Unions of India (Annual). 
5. Statistics of Factories (Annual). 
6. Pocket Book of Labour Statistics (Annual). 


т. The Indian Labour Journal (Monthly). The Indian Labour 
Journal is published monthly by the Labour Bureau. The statistics 
contained in the journal are divided under two sections : 


(i) Section A—Monthly statistics. 
(ii) Section B—Serial statistics. 
The monthly statistics are contained in a number of sections : 


(i) Prices and Price Indices—Industrial Workers Consumer Price 
Index, Agricultural Labourers Consumer Price Index, Tripura Tea 
Plantation Workers Consumer Price Index. 

(ii) Employment—Number of Cotton Mills (Spinning departments 
of all mills) by shifts worked and employment in Cotton Mills by State. 


(iii) Employment Exchange Statistics. 

(iv) Absenteeism. 

The Serial Statistics are also contained in a number of sections : 
(i) Price and Price Indices. 

(ii) Employment. 

(iii) Employment Exchange Statistics. 

(iv) Wages and Earnings. 

(v) Productivity. 

(v) Absenteeism. 


2. The Indian Labour Year Book. This is annual publication of 
the Labour Bureau, Ministry of Labour, Government of India. The 
latest year book is available for 1971 and this happens to be the 25th 
publication since its inception. It provides very useful information on 
various aspects of labour. The major heads under which information is 
provided are : 

(i) Employment, (ii) Wages and Earnings, (tii) Cost and Level 
of Living, (iv) Industrial Relation, (v) Labour Welfare, (vi) Industrial 
Housing, (vii) Health and Safety, (viii) Labour Administration, (ix) 
Labour Legislation, (x) Agricultural Labour, (xi) Indian Labour Over- 
seas, and (xii) India and the International Labour Organisation. 


3. Indian Labour Statistics. This is an annual publication of the 
Labour Bureau, Ministry of Labour, Government of India. It presents 
to the public a panoramic view of large masses of co-ordinated facts 
that go into the making of labour policies in this country and provide 
the basis for the appraisal of current labour - problems. An attempt is 
made to present che latest available statistics relating to labour in 
India. The various statistics have been presented generally from 
1951-56 onwards. These statistics cover the following aspects : 


(a) Employment in factories, Mines, Railways, Plantations, etc., 
(b Employment Exchanges and Training Centres, (c) Wages and 


a 
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Earnings, (d) Levels of Living, (e) Trade Unions. (f) Industrial 
Injuries, (g) Absenteeism and Labour Turnover, (h) Industrial disputes, 
and (i) Social Security. One section has been devoted to a number of 
tables relating to various economic indicators, not necessarily restricted 
to labour topics only. 


Limitations of the statistics presented. Official Labour Statistics, 
which this publication presents, have certain special features, which 
should be borne is mind for interpreting them correctly. Firstly, a 
greater part of these statistical data is geared to the administrative 
need of the Government. As such, major readjustments in the adminis- 
trative set-up often require some alteration in the Scope and coverage, 
and hence affect the comparability of the serial statistics. Secondly, 
though the periodical collection of the data in many cases is designed 
On a census basis, i.e, complete enumeration of the primary units, 
what emerges finally is often an incomplete census because of non-res- 
ponse and delay on the part of some of the State Governments in pre- 
paring and forwarding the consolidated returns to the Labour Bureau. 
In the absence of some specific statistical data in respect of a few States, 
it becomes impossible to complete the all-India statistics. Such incom- 
plete data should, therefore, be very carefully treated and refined be- 
fore they could form the basis for policy formulation. Thirdly, it is 
generally not possible to specify any rigorous margin of error of the 
data which arises mainly from factors obtaining from different levels 
of processing the data most of which are unfortunately not amen- 
able to probability treatment. Fourthly, data on different topics are 

_ generally not of identical scope and coverage because they are generated 
through different systems of collection. Therefore, for purposes of co- 
Ordinating one topic to another, a reasonable degree of comparability 
between the different statistical series must be established by a 
careful shifting of the available data. The statistical inference in such 
Situations is bound to be hazardous and the assessment needs to be 
attempted with extreme caution. 

4. Trade Unions in India. This is the biennial publication of the 
Labour Bureau, reviewing the working of the Trade Unions Act, 1926. 

he information contained in this volume rcflects a general picture 
about the growth of trade unions, their finances, activities, etc., during 
the period under review. 

Besides these regular publications of the Labour Bureau there 
are alsoa large number of ad hoc publications of the Bureau in the 
form of surveys and studies on different aspects of labour. They also 
contain factual data. These publications are divided into the following 

eads : 
(i) Family Budget, Family Living. Enquiries of Industrial 
Workers—Reports. 
(й) Survey of Labour Conditions—Report on Industries. 


(üi) Survey of Agricultural Labour. 
(iv) Occupational Wages Survey. 
(v). Women and Child Labour. 


I-s'9 LABOUR STATISTICS 

(vi) Agricultural Labour Statistics. 

(vit) Central Labour Survey— Reports. 
(viii) Index Number. 

(ix) Labour Laws and Awards. 

(x) Miscellaneous. . 

П. Directorate General of Employment and Training 
(DGET)—Ministry of Labour, New Delhi. The Directorate General 
of Employment and Training (DGET) has been collecting and analysing 
data on occupational pattern of employees in public and private sectors. 
The DGET collects data through the Employment Exchanges under the 
Employment Market Information (EMI) programme. 

The EMI is intended to provide information on the structure and 
disposition of labour force and trends in employment in different indus- 
tries and occupations. The data are published in a journal entitled 
Quarterly Emplosment Review. The journal gives information under 
four heads : 

A—Employment : Public and Private Sectors. 

B—Employment : Industry-wise Analysis. 

C—Zonal and State-wise Analysis. 

р Women Employees. 

Besides, it also contains data on employment outlook and man. 
power supply and demand. The data on man-power supply and 
demand are divided under four heads : (a) Work-seekers, (b) Vacancies . 
Notified, (c) Placements, ond (d) Manpower Imbalances. 

А Other publications of the Directorate of Employment and Train- 
ing are : 

1. Area Employment Market Report : (Quarterly) 

2. State Employment Reports (Quarterly) 

3. Report on Occupational Pattern of Employees ic and Pri- 
vate Sectors separately) [Biennial] Syl oman e gi 

4. Annual Area Employment Market Report (Annual) 

5. Report on Biennial Survey of Establi " 
Workers in Private Sectors (Biennial). wither возна 

Hl. Director-General of Mines Safet i 

Я ы y. The Director- al 
of Mines Safety issues a Monthly Coal Bulletin which оС p 
to-date picture of the „Various aspects of the industry such as labour 
employment in the collieries, their wages and hours of work. producti- 
уйу, etc- 1 

Statistics about such mines which come under i 

: the M Act 
can also be obtained from the Annual Reports. UNS е 
Gaps in Labour Statistics 

The Labour Statistics, as indeed all statistics, are generally collect- 

ed to meet the needs of administration. In some cases, the furnishing of 
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such statistics is'a statutory obligation on the employer or a trade 
union; in others they are gathered in the course of normal administration. 

The National Commission on Labour which submitted its report 
to the Government in 1969 observed that several inadequacies pertain- 
ing to labour statistics have come to light affecting different statistics 
differently. А major part of labour statistics is today collected 
through statutory returns prescribed under the labour laws, The 
Statutory returns are mostly annual and the statistics get compiled 
on ап all-India basis with a time-lag of 2 to 3 years. Though the 
returns are statutory there is usually a large degree of non-response 
which goes on varying from year to year. Periodic statistics collected 
on a voluntary basis, such as statistics of industrial disputes and 
absenteeism, suffer from limitations of a different type. The establish- 
ed series of statistics of absenteeism for the manufacturing sector, 
for instance, are compiled on the basis of monthly returns from 
selected units in important industries at a few centres. The statistics 
are not comprehensive and their representative character is open 
to doubt. 

Deficiencies in labour statistics arise inter alia from : 


(i) inaccuracy and unreliability owing to (a) poor response, (b) 
failure of primary agencies to send accurate reports, (c) handling of 
data by untrained staff, and (d) inadequacy of Staff; — 

(ii) variety of definitions of the same term in different statutes. 
For example, the term wages has been defined differently in different 
Statutes ; E 

(iii) varying response from agencies which supply data ; and 

(tv) delayed publication. 

The deficiencies pointed above are interrelated. For instance, 
delayed publication ıs the result of poor response and failure to 
receive accurate reports and the correspondence consequent thereon. 

arying response of primary agencies can result from a variety of 
definitions and operational concepts as well as from handling of 
queries by inadequate/inexperienced staff. While some deficiencies 
can be remedied by governmental action, others like poor response 
are a matter of education and training for field staff as much as for 
persons in the establishments which are expected to fill in the returns. 


Based on the findings of the National Commission on Labour, 
the views of the Chief Labour Commissioner and the present gaps in 
labour statistics, the following suggestions can be made to remove 
some of the deficiencies in labour statistics in our country : 

(1) The forms prescribed by authorities for seeking information 
are cumbersome and considerable rationalisation is possible in them. 
In several cases, by only marginal addition to information collected 
on some forms, many others could be made redundant. 


(2) Some statistics collected by the Central and State agencies 
purely to fulfil the statutory and administrative requirements, never 
see. the light of day nor are they used by policy-makers. . They occupy 
much. needed space in Government offices for long years and are 
then destroyed. This js a waste of national effort and resources. 
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Advance planning is needed in this regard. For this, the National 
Commission on Labour has recommended the constitution of a standing 
council consisting of agencies of Government in charge of collection 
of statistics, councia lrepresentatives of employers’ and workers’ orga- 
nisations and research institutions which will periodically review the 
requirements of statistics to be collected. 


(3) Data on labour productivity, unemployment and wages 
hardly exist and need considerable improvement. Data on labour 
productivity are not compiled either by the Government or by any 
private agency. Data on wages lack comprehensiveness, Besides 
money, Workers in most cases receive other benefits also such as 
free residence, uniform, shoes, etc. Data on these aspects should 
also be made available. 

(4) There is an urgent need for uniform definitions of the 
same term in different statutes. 

(5) Strict action should be taken against those organisations which 
do not fill up the statutory returns. 

(6) Every effort must be made to reduce the time-lag between 
obtaining the information and getting it published so that it could be 
more useful in framing suitable policies. 

(7) Тһегеіѕа pressing need for bringing out important economic 
indicators like the index number of employment wage rates and 
earnings at quarterly intervals. Expeditious action should be taken to 
organise these series on a statutory basis. Timely preparation of these 
series can be ensured by collecting data from a well.designed sample of 
establishments, 

(8) There are gaps in labour statistics with regard to employees 
in the unorganised sector such as small shops and commercial establish- 
ments and small-scale industries. The question of filling up these gaps 
isofhigh priority and the matter should Ье examined by the Central 
Government in consultation with State Governments. 

9. The collection of social and sociological data on workers' life 
such as information on aptitude to work, reaction to work environ- 
ments and study of the problems of displacement of workers should 
find a place in the future programme for development of statistics. 


However, it should not be forgotten that availabilities and gaps 
in labour statistics, as in the case of all other statistics, have to be 
viewed in the context of how much the community can afford to spend 
on satisfying its thirst for information. 


TRY YOURSELF 
т. Discuss the sources, scope and limitations of statistics relating to indus- 
trial labour in India. (B. Com., Delhi, 1970) 
2. Write a detailed note on the availability of statistics relating to various 
aspects of labour in India, (B. Com., Punjcb, 1972) 
_ 3. Critically examine the official statistics available on industrial labour in 
India with special reference to data on wages and earnings, union organisation, 
Social security benefit and industrial disputes. (M. Com, Delhi, 1973) 
„ 4: Comment on the nature and scope of statistics relating tolabour in 
India. State their limitations and give suggestions for their improvement, 

(B.A. Hon. Econ., Delhi, 1975) 
5. Review the sources of official statistics available to us to find out employ’ 
ment and wages in industries, (B. Com. Madras, 1974) 

6. Write a note on the nature and sources of Labour Statistics, in India. 
(B. Com. Delhi, 1976) 


Section 


6 Population Statistics 


Population statistics are the oldest of all statistics collected by 
nations. In ancient times when there was hardly any statistical organi- 
sation the leader of the tribe or group estimated his manpower to 
protect himself and his group from other tribes. Later, kings collected 
statistics of human population in order to safeguard their territory 
from foreign attacks. However, in the present era of aerial warfare 
and nuclear armament population statistics for army purposes have 
lost their significance. Today these statistics serve a different purpose 
—they focus attention on various socio-economic problems of the na- 
tions and help in the formulation of suitable policies. 


When we talk of population statistics, we mean thereby factual 
data on the number of persons living in the country, division into 
males and females, marital status, working and non-working popula- 
tion, nationality, religion, age, etc. 

There are three methods which are generally employed to collect 


population statistics : 

1. Population Census. In most countries a population census 
(complete count of all the people alive on a particular day) is conducted 
once in ten years under State direction. 

2. Registration of births and deaths. In almost all the countries 
ofthe world a record is maintained of the births and deaths called 
vital statistics, 

з. Ad hoc surveys. Ad hoc surveys known as demographic sur- 
veys are conducted by official agencies of a particular region to collect 
the required information. 

The population of a country can be enumerated accurately with 
the help of vital statistics alone, provided a correct and up-to-date 
record of births апа deaths is maintained in the country. However, 
generally these records are not complete and, therefore, vital statistics 
alone fail to provide an accurate idea about population growth. Popu- 
lation census deals with actual counting of all persons located in each 
area of the country on a particular day and collecting details regarding 
their age, sex, economic standard, occupation, nationality, etc, 
Population census provides us with the number of individuals inhabi- 
ting a country on a particular date while record of vital statistics pro- 
vides us with such information at any time we require. Though 
population census and vital statistics аге two independent m:thods for 
estimating the population growth, generally both the methods are used 
together. The demographic survey is not an independent method 
and generally it refers to a particular region or sector only. A brief 
description of census and vital statistics is given below : 


1-6°2 POPULATION STATISTICS 


Population Census 


According to the U.N. document ‘Principles and Recommendations 
for National Population Census’; a census of population may be defined 
as “the total process of collecting, compiling апа publishing demographic, 
economic and social data pertaining at a specified time or times, to all 
persons ina country-or delimited territory." 

Census today is not merely counting of heads. Census data are 
extremely useful, nay indispensable, in framing suitable policies and in 
the absence of this basic data most of the policies would be a leap in 
the dark and planning almosta failure. Such statistics are useful for 
economists, sociologists, businessmen, trade, industry and commerce. 
They provide valuable material for public health authorities for disease 
control and rationing authorities for estimating the requirements of 
food supply. 

The economist by studying the trend of population growth, its 
occupational structure and increase in rural and urban population is in 
a position to make valuable suggestions to the administration of the 
country. He can trace the correlation between population growth and 
food supplies, occupational changes and grant of protection to industries, 
increase in urban population and decay in rural population and make 
suggestions for changes in the State policy for proper and balanced 
growth of the country. 

'The sociologist studies the directions of reforms in matters of 
disease control, marital status, age at which people should marry, 
family planning, etc. Schemes of social insurance, urban and rural 
welfare all depend on facts supplied in population statistics. To 
businessmen population statistics supply useful information. The know- 
ledge about consumers, their classification according to income, loca- 
tion, etc., helps businessmen in launching sales campaign, market 
expansion possibilities, organising publicity drives and deciding the 
media of advertisment. Population statistics also serve trade, industry 
and commerce of the country to a considerable extent. Hence it is 
rightly said that population census is not merely counting of heads—it 


supplies a heap of valuable information which is of great significance 
in different spheres, 


Methods of Conducting Population Census 


, . Broadly speaking, there are two methods of conducting a popula- 
tion census : 


1. De Facto method, and 
2. De Jure method. 


In the de facto method all persons are counted in the area where 
they are physically found on the day or night of the census. Thus, under 
this method all persons living in the country are counted simultane- 
ously wherever they are found on the Census pight. In the de jure 
method, on the other hand, the population of each area is defined as 
persons who usually reside in the area regardless of their actual location 
at the census date. Thus, under this method people who are temporarily 
away from their normal place of residence on the census night are 
enumerated at the place of their normal residence. It should be noted 
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that the de facto method is also known as ‘‘one-night) enumeration 
method” or, simply, **date system" for the reason that in this method 
enumeration has to be done simultaneously in one night. On the other 
hand, the de jure method is also referred to as “period system of 
enumeratio?” for the reason that the census enumeration is not done 
in a single night but is spread over a period though the figures relate 
to a particular date. 


Some countries follow de facto method, others de jure method 
while still in others various combinations and modifications of these 
concepts are used. The merits and demerits of these methods are 
discussed below : 


Merits of De Facto Method. (i) This method of enumeration 
is simple. Enumerators or respondents can simply be instructed to 
report all persons who are present in each dwelling or other place of 
enumeration at the specified time. No elaborate instructions are required 
with reference to the inclusion or exclusion of persons who are away 
from home under different circumstances or for different reasons. 

(ii) The other advantage of the de facto method is the objectivity 
of definition. There are no borderline cases of persons who may be 
enumerated or not depending upon the judgment of the enumerator or 
respondent. 

(її) ““Ког purposes of international comparability it is desirable 
that a de facto enumeration may be made, that is, a count of all persons 
present in the country at the time of enumeration. Any data on a de 
jure basis which may be desired should be obtained in addition to the 
de facto data." — Recommendation of the International Agency. 

(iv) The de facto concept is especially advantageous from the 
standpoint of international comparability, for it is ап unequivocal 
standard that can be applied universally. without regard to differences 
in local conditions. 

(v) De facto figures would be useful to municipal authorities 
analysing the requirements for police protection, medical service, 
hygiene and transportation and to businessmen estimating the demand 
for products or services purchased by residents and visitors alike. 


Limitations of De Facto Method. (i) The simplicity of the 
concept is marred by the exceptions for military and diplomatic рег- 
sonnel which have been recommended by international agencies. Under 
the de facto method we also count those persons who do not actually 
belong to this country but are living in this country and we exclude 
those who areliving outside the country. Thus we do not get the 
actual figure. We include a portion of those people in whom we are 
not irterested and exclude a portion of those people in whom we are 
interested. 

(i) Census cannot be completed just overnight. А very large 
number of persons will have to be employed to carry out the job and 
this would involve huge expenditure. If the census-taking is spread 
over a number of daysit would result in duplication, etc. Thus de 
facto population census can also be defective. 


To remove this defect what is done generally is that on the first 
ten or fifteen days or any other number of days (varying in different 
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countries according to their practice) population is recorded and correc- 
tions are made on particular days. 


(iii) Again there is a difficulty with regard to the persons moving 
in trains, etc., from one place to another—how they should be counted 
and what place of their residence should be taken—whether that place 
where the train isor that place where they are going or otherwise the 
place from where they are coming. 


Merits of De Jure Method. (i) The de jure population is the 
permanent population, which is what is desired. 


(ii) A de jure enumeration may give more accurate results than а 
de facto one, if the enumeration is to be spread over а considerable time 
interval. Among the mobile population the dangers of double counting 
or omission multiply as the period of enumeration is extended. In parti- 
cular a de facto count of a very mobile population is subject to serious 
error if the enumeration is spread over a long period. A de jure count 
under such conditions is also likely to be inaccurate, but the errors in 
this case may be smaller if mostof the people who move during the 
interval retain a stable place of usual residence. 

(ii) А de jure enumeration is necessary in some countries to 
fulfil certain legal requirements : notably to provide statistical bases for 
the appointment of electoral representatives, tax assessments, monetary 
grants, etc., among the area of the country. In some countries the legal 
authority for the census explicitly requires a de jure enumeration. 


(iv) Local requirements for population statistics in connection 
with the planning administration of certain special services such as 
housing and education programmes which apply only to the resident 
population are better fulfilled by de jure than by de facto figures. There 
are also other uses for which de jurefigures seem preferable, including 
perhaps the population of base figures for the calculation of birth and 
death rates. 


Limitations of De Jure Method. (i) Ina де jure enumeration 
more complicated definitions and instructions are necessary to ensure 
complete and consistent reporting. Generally speaking, the purpose 
in this kind of enumeration is to allocate each individual to his place of 
residence, though there are many difficulties in the way. The main 
difficulty is that persons do not have a fixed single place of abode. 


. Qi) There are greater chances of errors and omissions and duplica- 
tion. Examples are—children away at school, individual or families 
that have left home for long or indefinite periods to travel or work 
elsewhere, people who have no fixed residence, families that maintain 
two residences, one in the city and another in the country. 


Even though detailed instructions and other special measures are 
taken to ensure a uniform basis for their enumeration, such persons are 
likely to be omitted or counted more than once. Even those persons 
who have a definite place of residence, but who are absent at the time 
of the census may be omitted, counted more than once or assigned to 
the wrong area, if the persons who report for them are not well 
informed. 


^ 
; 
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Fora country as a whole, the difference between a *de facto' or a 
‘de jure’ count is likely to be small under ordinary circumstances. The 
difference between the number of residents temporarily outside the 
country and the number of non-residents temporarily within it 
ordinarily amounts to only a small percentage of the total population. 
Under some conditions, however, the difference may be substantial. 


De facto definition is much more suitable as compared to de jure 
definition for uniform application in different countries. The purpose, 
for which de jure statistics are most appropriate, appears to be primarily 
national and 1пїегпа1 rather than international. 


A better method of obtaining both the de facto and de jure figures, 
which provides for the inclusion in the de jure count of persons tempo- 
rarily out of the country, is to enumerate at each dwelling both the 
persons who usually reside there but are temporarily away and those 
who are present but have a usual place of residence elsewhere. These 
two categories must be separately identified on the census schedules. 
The accuracy of the results may be improved by recording for the 
former group their location at the time of the census and for the latter, 
their usual place of residence. 


Methods of Enumeration 1 

There are two important methods of enumeration : 

1. Canvasser method or enumeration by appointed enumerators. 
The information is obtained by a personal interview and entered on the 
Schedule by officially designated enumerators. 

2. The Householder method от self-enumeration. Under this 
method schedules are distributed beforehand to all households, along 
with the instructions for filling them. The head of the household is 
responsible for entering the information and the schedules are collected 
by officially designated persons. | 

А combination of the two methods has also been used, part of the 
information being filled in by the respondents, and part by the officially 
designated enumerators. 

The following chart gives details of the methods of census taking : 


METHODS OF CENSUS TAKING 
! 


Етар [ 
De Facto! Method De ép Method 
| s 
| ] | | 
De Facto De Jure De Jure 
Cave Fiduseholder Canvasser Householder 
Method Method Method Method 


Population Census of India 

The Indian census today is universally acknowledged as the most 
authentic and comprehensive source of information about our land and 
people. The history of census-taking in India dates back to the year 
1872, when the first census was taken. This census was limited in 
coverage of geographical territory and information collected was also 
not complete. The first complete census of the whole country was taken 
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on Feb 17, 1881 and since then a census of population takes place after 
every 10 years. The last census relates to the year 1971. Over the last 
100 years the census operations have been keeping up with the varied 
demands of changing times. А greater degree of sophistication has 
come into our methods of collection, tabulation and analysis of statis- 
tical data during this period. 


Role of Census 
In developing countries like India, the periodical population 


census is the best source for obtaining a frame of area units suitable for 
carrying out sample surveys Covering various fields such as demography, 
agriculture, social and economic Statistics during the inter-censal 
period. In fact, by providing a suitable sampling frame, a population 
census facilitates, through sample surveys, the collection of a host of 
Statistics at national and regional levels, all of which cannot be furnished 
by the census itself except for some basic items. 


The Census Act of 1948 


Before independence there was no permanent census legislation. 
In 1948 a permanent Census Act was passed and the practice of 
enacting a fresh legislation every 10 years at the time of the census 
was done away with. The Indian Census Act, 1948, provides the 
Necessary legal authority for the conduct of the census, The Act 
Provides for the appointment of census staff and calling for public 
assistance for taking the census. The law requires every individual 
to answer the census questions truthfully and also authorize the 
census enumerator to. enter a house and interrogate people. But 
the census enumerator is prohibited from asking for information on any 
matter not necessary for the Purpose of the census such as the income 
ог wealth of a person. The Act enjoins on the census-takers to record 
the answers Correctly and keep the information strictly confidential. 


Census Organisation 


Till 1961, the census orgnisation in India was a variable Phoenix. 
census commissioner for the country was appointed on the eve 
of each decennial census and he planned the census operations, 
created a huge census machinery, organised the census enumeration, 
Processed the data and produced the reports. The census organisation 


the succeeding census. This ad hoc arrangement created several 
difficulties, for example. every time the census was taken the requisite 
machinery was set up. Also it did not allow for post-census continuity, 
SO necessary for attending to questions of improvements in the 
quantity and scope of census data. Since 1961, however, the census 
organisation, at least a nucleus unit, had continued during the inter- 
censal period. Now the permanent census organisation has come 
to stay. 
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The entire census organisation functions under the direction 
of the Registrar General and ex-officio Census Commissioner. for India 
who is assisted by a Superintendent of Census operations in each 
State. The Census Superintendents build up the census organisation 
for each district. An officer at the headquarters at each district, 
called Deputy Collector, is placed in charge of the census arrangements 
for a district and is designated as the District Census Officer. Below him 
are the taluk/tehsil/thana census officers. If the taluk/tehsil/thana is 
small, it is constituted into one census charge, if it is big, it is divided 
into a number of census charges, each with a separate charge officer. 
Each charge is divided into a number of circles, each under a circle 
supervisor. The circles are further subdivided into enumeration blocks, 
each under an enumerator. Thus the enumeration block is the ultimate 
territorial unit for the purpose of census count and jt is the census 
enumerator who approaches every household within his block during 
the period of enumeration to fill in the census questionnaire. At the 
1971 census the total number of enumeration staff of all categories was 
of the order of one million. Census officers up to the level of charge 
officers are selected from among Government servants, mostly from the 
Revenue Department. The enumerators are generally drawn from the 
school teachers, the village officers, clerical staff, etc. They all work 
on honorary basis in addition to their normal duties. Only a small 
honorarium is paid to them. At the 1971 census only a small amount 
of Rs. 25 was paid to each enumerator as an out-of-pocket allowance. 


Census is a great national undertaking requiring the co-operative an 
patriotic effort of one and all. 


1951 Census 
195r census was the first census of free India and the census 
authorities received full co-operation from the people. Up to 1931 


census, the census information was first recorded on a schedule and 
later on transcribed on special slips for purposes of sorting and classi- 
fication, For the 1941 and 1951 censuses the procedure was discontinued 
and the information was directly recorded on enumeration slips. The 
1951 census enumeration slips, one for each person enumerated, con- 


tained the following fourteen questions : 

т. Name and Relationship to the Head of Household...... 
(a) Nationality......(b) Religion......(c) Special Groups...... 
Civil condition.....- 
Age... 
Birth-place...... 
. Date of arrival of displaced persons and their District of 
origin in Pakistan. 

7. Mother tongue.....- 

8. Bilingualism* 


Dune HN 


*By bilingualism in question 8, was meant any Indian language other than 
the mother tongue which a person commonly spoke. Only one language was to 
be recorded under bilingualism even if a person knew two ог more languages other 
than the mother tongue. Further, the language was to be Indian. 
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9. (a) Dependency...(b) Employment...... 
то. Principal means of livelihood...... 

її. Secondary means of livelihood...... 

12. Literacy and Education...... 

13.  Unemployment...... 

14. Sex...... 


There were thirteen questions common to all parts of India and 
one question was included by each State according to its own liking. 
In U.P. this Special question related to unemployment. In Bombay 
and some other States the question related to fertility. The information 
collected in тот census included almost all items which the United 
Natiorfs Organisation wanted the various countries of the world to 
include in their lists. The only topic about which information was not 
collected on all-India basis related to fertility in which case also some 
ie like Bombay and Tamil Nadu included a special question in their 

ists, 


Changes introduced in 1951 Census. The following is a brief 
summary of the changes introduced in 1951 census : 


I. The Census Act passed in 1948 for the census of 1951 was 
made a permanent document and the System of enacting a fresh legisla- 
tion for census taking every то years was done away with. 


2. The office of the census commissioner was created on a perma- 
nent basis. Previously this office used to be liquidated every time 
census operations were over. 


.3. The period of enumeration which was one week in 1941 census 
Was Increased to 3 weeks in the census of 1951. 


4. For the first time in 1951 census a distinction was made be- 
tween “house” and ‘household”. In the census of 1951, a house was 
defined as a dwelling place with a separate main entrance and a house- 
hold was defined on the basis of Chulha and a group of people who 


- 5. A National Register of Citizens was prepared for the first time 
in 1951 census. It serves as a national inventory for various official and 
non-official investigations. The register was prepared by copying from 
the census slips information about individuals on the registers. Every 
village and every ward ofa town had a register of its own which was 
considered to be a part of the national register. 


6. Population was classified forthe first time on the basis of 
(a) Dependency, and (b) Employment. 


about divorced people, displace d Persons, etc. An attempt was also 
made for the first time to have anidea about the economically active 
population of the country. 
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196: Census 


The 1961 census was the ninth decenni i 
country and it coincided with the тыкы ea vic oar 
plan. The special features of the 1961 census were : 

Д I. The distinction between house and household which was made 
P shi poe poi seh oti yos ишпей and made more elaborate. 
Census House, and (c) Hou hold det ad Се 2 
КЫ sie argh ees useholds. hat was defined in 1951 census 
i ә s dividec into two categories of *building' and *census 

ouse'. The word ‘building’ referred to an entire structure raised on 
ground. Tbe term ‘census house’ referred to а building or a part of 
building having a separate main entrance not necessarily leading to a 
road or lane. Thus one building could have a number of census houses 
Mesi i ers separate entrance. The definition of household 

2. For the first time in 1961 census separate slips 
households and individuals. in household bc RRL ere nn 
collected about the households engaged in (a) cultivation, (b) household 
industries, or (c) employed as labourers in either cultivation or house- 
hold industries or both. Such statistics were never collected in the 
past. A large variety of useful information was collected on individual 
slips and one slip was meant for one individual. 

__. 3 The occupational classification adopted in this census was 
different from those adopted in earlier census. For the first time in 
this census the whole population of the country was divided into two 
broad categories of “working” and '"*not-working". 

; 4. In 1961 census the house list was also extended considerably to 
include a large variety of information. At the time when the house list 
was being prepared information was collected about the purpose for 
which a house was used, namely, for residence. or for shop or workshop 
or school or any other institute, etc. In case a house was used as a work- 
shop or factory further information about the number of persons 
employeg; type of work done, kind of fuel or power used was also noted 
own. 

5. The census covered for the first time Jammu and Kashmir and 
other snow-bound areas. 

6. Though the actual census enumeration started on February 
то, 1961, the ancillary work had started nearly a year earlier. The 
questionnaire that was used on the occasion was tested ata ''test 
census” held in Delhi, Bombay, Calcutta and Madras. 

У 7. Some questions were altogether dropped, for example, ques- 
tion relating to the displaced persons. Marital status of persons was 


recorded in greater detail than in the previous censuses and a separate 


card was prepared for technical and scientific personnel. 


The Census Questionnaire 
At the 1961 cersus the following three documents* were prepared : 


I House List. 
is given in an appendix at the end of this 


. *Specimen of these documents 
section. 
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П. Household Schedule-cum-Census Population Record. 
ПІ. Individual Slip. 


I, House List. The house list was prepared six to nine months 
before the enumeration and all the houses and households were listed 
for purposes of enumeration. In the 1961 census a standard form was 
adopted for the first time throughout the country which was designed 
to collect information on the use to which a census house was put, on 
the material of wall, material of roof ofa census house, whether the 
census house was owned or rented and the number of rooms, if the 
house was used for dwelling, together with essential data concerning 
bouses that were used as establishments, workshops or factories, like 
name of establishment or proprietor, name. of product (s) produced, 
repaired or serviced, number of persons working and kind of fuel or 
power, if machinery was used. 


П. Household Schedule. The household schedule was intro- 
duced for the first time in 1961 census, no data on the household as 
such having been collected in earlier censuses. The object was to find 
out the number of people engaged in agriculture in household industries 
or in both. Information was also collected about their being employees. 
At the back of the Household Schedule there was a Census Population 
Record as counted at the census in each household giving an abstract 
ofinformation for each individual enumerated in the household, i.e., 


name, sex, relationship to head of household, age, marital status and 
description of work, if working. 


The following questions were included : 


(i) Is the household. an institution, as distinguished from a family 
household, i.e., a jail, a hospital, a hotel, etc. The idea was to find out 
the number of households which were different types. of institutions as 
distinct from households which were family units. 


(i) Name of the head of the household, i.e., the name of the per- 
son on whom fell the chief responsibility of maintaining the household 
who may not necessarily be the eldest or the male member of the family. 
In hostels, hospitals, etc., the Superintendent was taken as the head of 
the household and such households were classed as **households of un- 
related persons". 


(її) Does the household belong to scheduled castes or tribes: 


(v) Households engaged in cultivation and/or household industries 
and details of persons working in them, This section of the household 
slip was divided into the following three parts : (a) cultivation, (b) 


household industry, and (c) workers at cultivation and/or household 
industries, 


Section A was filled by those households which had agricultural 
land end which was cultivated by the members ofthe household or 
given to others for cultivation. This section showed totalarea under 
cultivation as classified into : 


(а) Area owned or obtained from Government by the household 
and cultivated by the members of the household ; and 
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(b) Area obtained from other people but cultivated by the mem- 
bers of the household; thus giving the total area cultivated by the 
members of the household. t 


Section B was filled only by those househclds running some indus- 
try possessing the following characteristics : 


(a) | producing goods or performing ancillary services like oiling 
and cieaning or repairing of goods produced ; 


(b) were of a family size and in activities of which mostly the 
members of the household participated ; 


(c) were not registered factories although they may be using 
power ; and 


(d) were located at the residence of the proprietors if in an ur- 
ban area or anywhere in the village if in a rural area. 


Statistics were collected about the number of months (approxi- 
mately) for which such industrial units functioned in a year. 


Section C dealt with the number of persons engaged in household 
industries or in cultivation, or in both, enumerating specially whether 
head of the household was working or not апа. who other members of 
the household worked specifying. separately the number of male and 
female workers and the hired labour. 


It will be clear from the above account that for the first time in 
1961 an attempt was made to find out the number of households and also 
the number of people who were working in agriculture or household 
industries or in both. Distinction was also made between whether they 
were owners or only hired labourers. 


Ш. Individuals Slips. On individual slips statistical informa- 
tion was collected separately for each single individual of the country. 
One slip was filled for one individual only. The. information collected 
in the individual slip related to the following three main aspects : 


1. Demographic data like relationship to head of household, sex, 
age, marital status and birth-place ; 

2, Social and cultural data like nationality, religion, literacy and 
mother-tongue ; and 

з. Economic data, like occupation, industry, class of worker, and 
activity, if not working, were also collected. The individual slip con- 
tained 13 questions in all given below : 

т. (a) Name. | The name of the person to whom the slip related 
was noted down. If the name of a lady was. not disclosed then in. place 
of name she was referred to as "the wife or mother or daughter of so 
and so". If the informant. was a lady and she did not want to speak her 
husband's name then her husband's name was recorded as “the husband 
of so and so". Newly born baby who was not given any name was 
recorded as *'child".- : , ny 
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(b) Relationship with the head of the household. Actual relation- 
ship to the head ofthe household was mentioned. The head of the 
household was not necessarily the eldest member nor necessarily a male 
—he was defined asone on whom fell the chief responsibility for the 
maintenance of family members. In case the slip related to such a 
household as a hostel or hospital where the Superitendent was taken as 
the head of the household and the individuals were not related to him, 
it was mentioned that individuals were not related to the head. 


2. Age on last birthday. The age of each individual was record- 
ed as on last birthday. The age was recorded in complete years only. 
The age of children below one year was recorded as zero. 


3. Marital Status. People were classified as : 
(a) Never married, 

(b) Married, 

(с) Widowed, 

(d) Separated or divorced. 


A married person was one who was married either once or more 
than once and whose wife or husband was alive on the date of census. 
Even those persons who were recognised by custom or Society as marri- 
ed or who wereliving as husbands and wives, even though no formal 
marriage was performed, were treated as married for the purpose of 
census. A widowed person was one whose husband or wife was dead 
and who had not married again. Such couples which were divorced 
either by a decree of Law Court or through recognised social and reli- 
gious custom and who had not married were treated as divorced. Those 
husbands and wives who had separated without any intention of reunion 
were also included in the category of separated and divorced. Prosti- 
tutes were classified according to the answers which they gave in reply 
to the question about marital status. In earlier census prostitutes 
were treated as unmarried. 


4. Place of birth. Data about place of birth were collected on 
the following basis : 
(a) Born in village or town in which enumerated. 


Й (b) Born in another village or town of the district in which 
numerated. 


(c) Born in another district of the State in which enumerated. 

(d) Born in another State of India. 

(e) Born in another country. 

(f) Born on sea, air, railways or road vehicles. 

(ii) Whether born in village or in town. This information was 
collected separately from the information of question 4 (a) above. 
Persons born in places which were not considered a town at the time 


of their birth but were in the category of town at the time of census 
were considered to have been born in town. 


(tit) Duration of residence, if born elsewhere. This information 
was collected about those people who were not born in village or town 
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in which they were enumerated. If a person was born in any other 
village or town of the same district where he was enumerated or if he 
was born in another district (of any State) or in any other country, his 
length of residence at the place of enumeration (in complete years) was 
noted down. Ifthe period of stay was less than one year, the length 
of residence was recorded as zero. 

- (a) Nationality. If a person had a nationality other than 
Indian then the name of the country of his nationality was noted 
down. 

(b) Religion. Data were collected about all religions but symbols 
were assigned only to Hindu, Muslim, Charistian, Jain, Buddha and 
Sikh religions. For other religions, the actual name of the religion was 
noted down. : 

(c) Scheduled Castes and Scheduled Tribes. The answer to this 
enquiry was recorded only if a person belonged to the scheduled caste 
or scheduled tribe. 


6. Literacy and education. The following information was col- 
lected about literacy and education : 

(a) Persons who could neither read nor write or who could read 
but not write. Such persons were treated as illiterate. 

(b) Persons who could both read and write. Only such persons 
were treated as literate and the test of literacy was whether a person 
could read and write a simple letter. 

(c) Ifa person was literate (i.e, he could both read and write) 
and he had passed some examination, a further enquiry about the 
highest examination passed was made and the answer recorded. 


7. (a) Mother tongue. Mother tongue was supposed to be the 
language in which a person's mother spoke to him or her in childhood 
or the language commonly spoken in the family. If the mother of a 
person died in his childhood, the language commonly spoken in the 
family during his childhood period was taken as mother tongue. For 
infants and deaf and dumb people the language spoken by their mother 
was recorded as their mother tongue. 

(b Any other language. If а personknew опе or more 
languages (either Indian or foreign) other than the mother-tongue, they 
were also noted down. However, not more than two languages were 
recorded for a person. 


8-11. Questions No. 8 to 11. were devoted to “Working popula- 
Чоп” and number 12 to those who were *Not-workirig". 

Those who were classed as working could belong to any one of 
the following categories : 

(i) Working as cultivators (О. No. 8). 

(i) Working as agricultural labourer (О. No. 9). 

(iii) Working in any household industry (Q. No. 1o). 


(iv) Doing work other than 8, o, то (О. No. 11) and for persons 
who did not work either as cultivators or agricultural labourers or in 


T6614 POPULATION STATISTICS 


household industries, the actual work which they were doing was 
recorded in question No. 11. For persons falling in this category data 
were collected about : 


(a) Details of work done by such persons. 


(b) Details of the industry, business, trade, profession or service 
where such persons worked. 


(c) Details of economic status of persons who were classified in 
thiscategory were also obtained. For this purpose, the following 
classification was followed : | (i) Employers, (ii) Employees, (iii) Family 
workers, and (iv) Single workers. 


(d) Names of business units or the institutions where such per- 
sons Worked were also noted down. 


12. Activity, if. not working. All such persons who were not 
doing any work under question 8, 9, and 11 were treated as not-work- 
ing persons. Eight categories of such persons were mentioned and 
they were as follows : 


(i) А whole-time student or child going to school who did not 
do work like making articles at home for sale. 


(ii) A house worker (like housewives) not engaged in any remu- 
nerative work. 


(iii) Dependants including infants and children not going to 
school, and persons permanently disabled due to illness, old age, etc. 


. (iv) A retired person (not re-employed), renteer, royalty or 
dividend holder not doing any work. 


Р (v) А beggar, vagrant or a person with undisclosed source of 
income. 


(vi) Convicts in jail (not undertrials) or inmates of penal, mental 
or charitable institutes. 


(vil) A person in search of employment for the first time. 


(vii) A person having worked before and in search of further 
employment. 


13. Sex. All persons in the country were recorded as-males or 
females. Eunuchs and hermaphrodites were treated as males for census 
returns. 


Some Findings of 1961 Census 


e phe total рон, according to 1961 census was 439,235,082 
(ie: 439235 million) of which 226,293,620 were males and the remain- 
ing, Le. 212,941,462 females. This gives a sex ratio of g41 females рег 
1,000 males. The annual rate of increase in population comes out to be 
2'5 per Cent (21'5 per cent for the decade 1951 to 1961). The density 
of population per square kilometre for the country as a whole works out 
at 138. Of the total population about 82 per cent live in the urban 
areas and 18 per cent in rura] areas. The birth rate was 41°7, the 
death rate 2278 and the expectation of life 412 years. 
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1971 Census 


The 1971 census marked the completion of 100 years of decennial 
census-taking in the country. The enumeration was conducted 
between 16th March and 3rd April 1971 (with reference period as 
the sunrise of rst April) except in certain cases where it was taken 
between September 1970 and February 1971. Over 12 lakh enumerators 
were employed on “орегайоп census." 


Preparations for the 1971 population: census of India started in 
early 1967 when a four-day seminar was held in New Delhi on 3-6 
May, 1967, for evolving the questionnaire, concepts and procedures 
for the coming census, which will be the third census of free India, 
While discussing the scope and coverage for the 1971 Census, the 
seminar took the view that comparability with the 1961 Census be 
maintained to the extent it was possible and some of the most impor- 
tant new inquiries be added so that necessary data could be available 
at the national and State level for planning purposes. 

The questionnaire, evolved after this seminar was put to field 
trial in the last quarter of 1967 in all the States in India. Soon after 
the pre-test, the experience of collecting data was discussed at a 
conference of the Superintendents of Census Operations in the 
States at New Delhi in January, 1968. 

The Office of the Registrar General of India convened a second 
conference of the users of census data and the Superintendents of Census 
Operations in October, 1968, in order to revise the earlier drafts 
of the questionnaires in the light of the comments received by them 
from various organisations and individual research scholars and in 
the light of the experience of the first pre-test. The discussion at 
this conference led to the substantial modification of the earlier 
questionnaire, 

Reference Date. The reference date for the population count 
of 1971 was Ist April, 1971. The enumerators started their work 


of enumeration on roth March, 1971, and ended it on 31st March, 1971, 
i.e., each enumerator had 21 days to finish off the work of enumera- 


tion in his area. They went round their jurisdiction on a revisit 
from ist April to 3rd April, 1971, to bring the count uptodate, 
i.e. as on the sunrise of rst April, 1971, by enumerating the fresh 
arrivals who were not enumerated elsewhere and also new births and 
similarly. cancelling the schedule relating to а person. who might 
have died prior to sunrise of Ist April 1971. On the night. of 31st 
March, 1971 each enumerator counted houseless population within his 


jurisdiction. 
Census Schedules. 
1971 census consisted of : 
(i) House List. 
(ii) Establishment Schedule. 
Gii) Individual Slip. 
(iv) Population Record. 
6(43 —103[1977) 5М-1—11"77 


The census schedules adopted for the 
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The schedules are given in Appendix II of this section. 
(For facilitating comparison documents used in 1961 census are 
also given in Appendix 1.) 


(i) The House List. .This schedule was intended to provide 
a complete list of all census house and households in every village 
and town and also the approximate population. This schedule was 
canvassed during 1970 and it formed the basis for the fixing of 
population census enumerators block for 1971 in such a way as to. 
ensure complete coverage without omission or overlapping of house- 
holds. The House List of 1971 by and large follows the pattern of 
1961 Census House List. The improvements are that in respect 


as factories or workshops were collected. At the 197: census a 
separate schedule called the Establishment Schedule was canvassed 


(ii) The Establishment Schedule. This was a new schedule 
developed for the 1971 census. It covered all Establishments, 
Manufacturing, Trade or other establishments where people work. 


of entertainment or where educational, religious, social or entertain- 
ment services are rendered. It is necessary that in all such places one 
От more persons should be actually working. 


The establishment schedule gathers particulars whether the 
establishment is a government, quasi-government, private or co- 
operative institution, average number of persons working, if used 
as a manufacturing establishment. If it was a household industry, 
registered factory or unregistered workshop, the description of products 
processed ог Servicing done, type of Power used; if it isa trade 
establishment the description of goods bought or sold and if retail 
Or wholesale and if used as other establishment its description such 
ав office, hotel, t heatre, etc. 
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(iii) The Individual Slip. On individual slip statistical information 
was collected separately for each individual of the: country. Опе 
slip was filled for one individual only. This constituted the basic 
schedule of 1971 population census and contained the following 
17 questions. 


i. Name. The name of the person being enumerated was 
recorded. Newly born babe who was not given any name was recorded 
as "baby". 


2. Relationship to head. The actual relationship to the 
“Head” of household was mentioned in full. In case of institutions 
like boarding houses, messes or friends living togther the manager 
or suprintendent or the person who had administrative responsibility 
or who by common consent was regarded as head was recorded as 
“Head” of the household. But he must be living in the institution 
otherwise he will be counted at his place of residence, and in the 
institution, someone living there and recognised as head would be 
mentioned. 

In case of visitors, boarders or employees, ‘visitor’, ‘boarder’ or 
employee’ was recorded. 

In case of institution the member was recorded ‘unrelated’, 


3. Sex. Population was classified in two categories—for males, 
‘M’ and for females ‘F’ was written within the circle indicated against 
this question. 

Eunuchs and hermaphrodites were recorded as males. 

4. Age. Age was recorded in total years completed last 
birthday, i.e., if on the date the enumerator visited him he was 
30 years t1 months and 20 days he was recorded as 30 years. 

For infants who had not completed ‘‘one year" zero was recorded 
and “infant” was added in bracket. 

s. Marital status. In answering this question the following 
abbreviations were used. 

N.M. : Never married 
М: Currently married, i.e., those whose marriage subsists 
even though the wife may not be living with husband. 
W : Widowed. 
S : Separated or divorced. 

For a prostitute the marital status was recorded as declared 

by her. 


6. For Currently married women only. This question was 
recorded only for currently married women. _ 4 

(a) Age at marriage. The age at marriage in completed years 
was recorded. If a woman had been married more than once, her 
age at which she got married for the first time was recorded. 

(b) Any child born in the last one year. For the currently 
married woman who had given birth to a child in the last one year 
prior to date of enumeration ‘Yes’ was recorded otherwise “No 


was written. 
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Still britb; i.e., a child who was born dead was not taken into 
account for this purpose. 

7. Birth place. (a) Place of birth for those born in the village 
or town where they were enumerated ‘P.L.’ was written. For those 
born outside the place of enumeration, the actual name of the place 
was recorded, 


(b) Ruraljurban: For those for whom ‘P.L was recorded 
against question 7 (a) ‘X’ was put against this question. For others 
if the place of birth was village *R' was written and for town/city 
‘U’ was written. _ 

If it was difficult to decide the place of birth as Rural/Urban ‘Not 
known’ was recorded. 


(c) District. For those for whom 'P.L.' was recorded against 
question 7 (a) ‘X’ was put. For others the name of district in which 
the place of birth fell was recorded. Where it was difficult to know, 
“Not known” was recorded. 


(d) State|country. For those born within the Union Territory of 
Delhi ‘X’ was put. 


If the place of birth was outside Delhi, the name of State/ Union 
"Territory was recorded. 


In case of persons born outside India, the name of country such as 
“Sri Lanka"; “U.S.A.” and “U.K.” etc., was recorded. 2 

8. Last residence. For those who had been in the village or 
town of enumeration continuously since birth “Р.Г” was recorded. 
For others the actual name of place, i.e. the name of town or village, 
was written, 


recorded. 


10. Religion. For recording religion the following abbreviations 
were used : 


Н : Hinduism 
I: Islam 

C : Christianity 
S : Sikhism 

B : Buddhism 
J : Jainism. 


For others the actual religion was recorded. 


11. Scheduled Castes or Scheduled Tribes.* 


to a Scheduled Caste his caste was recorded, карера ыы 


* B e dizi / $ 
nnam 1. There is no Scheduled Tribe in Delhi according to existing Presidential 


2. 30 castes have been 


only to Hindu or Sikh Б суны ыза, as Scheduled Castes which could belong 
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(12) Literacy. А person who could both read and write with 
understanding in any language was treated as literate. А person 
who could merely read but not write was not a literate. For literates 
‘L’ and for illiterates “О” was recorded. All children of the age of 4 
years or less were treated as ‘illiterate’ evenif they were going to 
school. 

(13) Educational level. The highest educational level attained by 
a person was recorded. For those still studying the highest education 
or already attained was recorded. A ‘Degree Holders and Technical 
Personnel Card’ was given to graduates or postgraduates and also to 
those with technical diploma or degree. 


14. Mother tongue. Mother tongue was taken as the language 
spoken in childhood by the mother of the person concerned. In case of 
infants and deaf-mutes, the language usualy spoken by mother was 
recorded. 

15. Other languages. A separate record was made in case of per- 
sons who knew any other languages Indian or foreign other than his 
mother tongue. Only two languages were recorded in the order in 
which he spoke and understood them. For recording this question only 
working knowledge for conversation was treated sufficient—ability to 
read and write was necessary. 

16. Main activity. All the persons being enumerated were divid- 
ed into two broad streams of main activity, namely (1) Workers and (2) 
Non-workers. 

A. worker was defined as a person whose main activity was parti- 
cipation in any economically productive work in his physical or mental 
activity. 

The following categories of workers and non-workers were 
formed : 


16. (a) (i) Worker 16. (a) (ii) Non-worker 

Cultivator “С” Household Duties *H* 
Agricultural Labourer ‘AL’ Student ‘ST’ 
Household Industry ‘HAD Retired person or a Renteer ‘R’ 
Other Workers ‘OW’ Dependent ‘D’ 


Beggars, etc., В’ 
Institutions ‘I’ 
Other Non-workers ‘О? 

(b) Place of work (Name of village/town). The name of village/ 
town where the person worked was recorded. If the place of work was 
the same as the village or town of enumeration ‘PL’ was written. 

(e) Name of establishment. The exact name of factory, firm, work- 
shop, business house, company, shop, office, etc., was recorded. In case 
of government establishment Central or State Government was specified. 

(d). Nature of industry, trade, profession or service. In order to 
enable proper classification of the sector of economy in which a person 
was working the following divisions were made : 

(i) Plantation, Forestry, Fishing, Livestock, etc., 


(ii) Mining and Quarrying, 
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(iii) Manufacturing and repair, 
(iv) Construction, 
(v) Electricity, gas and water supply, 
(vi) Transport and communications, 
(vii) Trade and commerce, and 
(viii) Professions and services. 


(c) Description of work. Under this question the description of 
the work irrespective of type of industry, trade, profession or service of 
deg being enumerated was recorded for all those persons engaged in 

ousehold Industry or were classified as ‘other workers’. 


For those in Military service [in the service of Central Govern- 
ment’ was recorded—no other detail was given. 


17. Secondary work. This question was filled for those persons 
who were engaged in any secondary economically productive work in 
addition to as reported against Q. 16 (a) (i) and 16 (a) (ii). 

The Individual slip was a fairly comprehensive schedule which 
attempted to collect all essential demographic, social and economic 


characteristics of every individual that can possibly be collected at an 
Operation of this nature. 


individual slips. The Population record helps to give a good picture 
of the composition of the household and these schedules which will be 
maintained in convenient books for each administrative unit will serve 
as a good frame for future surveys. 


‚_ At the 196r census, on the obverse of the Population Record then 
eveloped there was a Household Schedule in which certain particulars 


ed as also details of the Household Industry. Since an agricultural 


Comparison of rg6r and 1971 Individual Slips. The new 
features of 1971 census Individual Slip as compared to that of the 1961 
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Special Е atures of 1971 Census 
The special features of 1971 census are : 


(1) For the first time an attempt was made to collect data оп cur- 
rent fertility. i 

(2) А separate card known as “Degree Holders and Technical 
Personnel Card” was introduced in 1971 census. This card. was to be 
filled in by post-graduates or those with technical diploma or degree. It 
contained 16 questions. In 196т census a special enumeration was 
carried out of Scientific and Technical Personnel and for this a separate 
card was issued only to persons with a recognized degree or diploma in 
Science, Engineering, Technology or Medicine. ; 

(3) For the first time provisional results were issued as early as 
one month after the census operations (i.e., on Ist May, 1971) on the 
basis of 10% sample of the individual slips. 

(4) For the first time detailed information was obtained on mig- 
ration, 

(s) Till 1961 census, the census data was manually tabulated in 
different tabulation centres. For the first time in 1971 census the data 
are being computerised. However, the manual sorting is not altogether 
avoided. The data pertaining to 20% of the urban individual slips is 
being transferred on to punch cards and thence on to magnetic tape 
and all the cross-tabulation for the urban data will be generated by 
processing this data on an electronic computer. The cross-tabulation 
of the rural data will be derived from 10% sample of the individual 
slips which will be processed manually. The tabulation of the House 
List data will be done on a 20% sample basis. The establishment 
schedules will be processed 100%. The data relating to the housing 
and establishment will be processed on the electronic computer. The 
1971 census is a pioneer in the introduction of sampling procedures to 
a considerable extent. 

Suggestions for Improvement. There is no denying the fact 
that since independence a continuous effort is being made to improve 
the quality of census» data. However, still there is scope for improve- 
ment. А few suggestions аге made here which may be of use for 1981 


census. 

1. The quality of data depends largely upon the enumerators 
who go from door to door and collect.the data. For about a month of 
work they are required to do they are paid an honorarium* of Rs. 25 
and that too after a considerable lapse of time. Most enumerators are 
not very enthusiastic about census because the payment is very much 
disporportionate to the amount of work they are required to do and 
hence most of them don't take census operations seriously. It is sug- 
gested that to extract quality from the enumerators, each enumerator 
should be paid at least Rs. roo. No doubt, this would increase finan- 


cial burden but the data obtained are likely to be more reliable. 


* The honorarium paid to the enumerators is double in 1971 as compared to 
total amount of Rs. 20 was paid both for preparing house 
ork. In 1971 census Rs. 40 Were paid for both these opera- 
tions (Rs. 15 for house list preparation and Rs. 25 for enumeration). Also the 
charge superintendents were paid an honorarium of Rs. 150 whereas in the earlier 
censuses they were paid nothing. 
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2. Since the beginning of the. census Operations in India census 
taking has been mostly the Privilege of school teachers. Most of the 
techers feel very reluctant to take up the work which is virtually for- 
ced upon them. Many teachers claim that the opportunity cost of 
this census work is very high for them as they have to miss examiner- 


ships or invigilation work, etc. Even otherwise the normal functioning _ | 


of schools is very much affected during the census period as many of 
the teachers can leave the teaching work on the pretext they have to 
attend a more urgent and important work. 


A solution to this problem is provided by a backlog of educated 
unemployed persons in our country who will welcome even a small 
opportunity 1f they are called upon to work, They can be temporarily 
employed for a period of 6-8 weeks and paid about Rs. 300 as an 
honorarium. It js suggested that the estimated cost of census be 
Spread over a period of 10 years since census is taken only after 10 
years. Thus in every budget some small Provision for census ex- 
penditure can be made. In this way a larger amount can be spent 
without feeling the burden. 


in the form of instruction booklets, etc., but most of them never open 
the pages and care to read them. It is suggested that each enumerator 


| 4. Compared to 1961 census in which there were 12 questions 
in the individua] slip, in the 1971 census questionnaire there were 17 
questions and some of these were elaborate. When the number of. 
questons is large the informants do not take much care to reply 
as the tendency will be to reply hurriedly every question. It is, 


therefore, suggested that the number of questions should not 
exceed rs. ' 


non-worker, rural/urban, Jliterate/illiterate, mother tongue, etc., need 
reform. To take a specific case eunuchs and hermaphrodites are 


number is changing over a period of time. Hence it is suggested that 
а separate record of these people be kept. To take another case a 
person is treated as literate if he can read and write. It is not specified 
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what he should be able to read and write ап@й зо even if he can 
sign he is treated as literate. 1615 suggested that a person should 
be counted as literate only when he сап read and write a letter in 
any language of his choice. 

6. Though it is interesting to note that for the first time some 
small use of computers is being made in 1971 census it is suggested 
that efforts should be made right from now so that by the time of start 
of 1981 census we are able to computerise theentire set of data. The use 
of computers will not only save manual labour but will also provide 
quick results. One is surprised to note that for 1961 census the entire 
set of about 1,800 publications has not yet been released. Such data 
which is released after a time gap of 10 years or so is not going to serve 
any purpose except for some historical importance. 

Findings of 1971 Census 

Some provisional results were released by the Registrar General 
on 1st May, 1971. They are: 

Total population. According to the 1971 census records, India’s 
total population as on 1st April 1971 was 54,79:49,809 (1.., about 54°79 
crores). There has been an unprecedented growth rate of 24:8 per cent 
over the past decade (an increase from 43'9 crores in 1961 to 5479 
crores in 1971). 

The population of India has grown very rapidly since 1921. The 
figures of population growth from 1891 to 1971 are given in the table 

elow : 
POPULATION OF INDIA AT EACH CENSUS (1891—1971) 


Census Population Increase or Decennial 
Year (Lakhs) decrease per cent 
over prece- variation 
ding decade 
(Lakhs) 
1891 2,359 EE 5 
1901 2,363 +4 +0°17 
1911 2,521 +158 +5'73 
1921 2,513 —8 —5'31 
1931 2,791 +277 A rror 
1941 3,187 +397 —14'22 
1951 3,611 +424 —I331 
1961 4,391 +780 2150 
1971 5,479 +1,088 --24'80 


The total population has been estimated to increase from 547 
million on 1st March 1971 to 705 million in 1986. 
` Sex ratio. Males outnumber females by about 2 crores. There 
were 284 million males as against 264 million females in the country, 
thus there being only 933 females per 1,000 males. 'Thus the sex ratio 
has further deteriorated compared to 941 in 1961. Only Kerala (1,016 
females per 1,000 males) and Dadar and Nagar Haveli (1,007) show an 
excess of females over males. 
А Literacy rates. The literacy rate (including age group 0-4) has 
increased from 24/03 per cent in 1961 to 29°45. The rate for the male 
population being 39'45 and that of the female population being 18'70, 
clearly 70 per cent of the population is still illiterate. Chandigarh has 
the highest literacy rate of 6124 per cent, followed by Kerala (60°61 


y. 
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per cent). 


Density of population. The national average density of popula- 
tion is 178* per square kilometre as compared to 138 for 1961 census. 
Kerala continues to be the most thickly populated State with a density 
of 548 followed by West Bengal (507), Bihar (324), Tamil Nedu (316), 
U.P. (зоо) and Punjab (268). Among the Union Territories the first 
place 1s taken by Delhi (2,723), followed by Chandigarh (2,254), Lacca- 
dive, Minicoy and Amindive Islands (944). 

The figures relating. to density of population and the percentage 
increase of population between 1921 and 1971 are given below: 


DENSITY OF POPULATION 
Year Density per sq. Decade 
hm. 


Per cent increase 
in population 


1921 81 1921—31 то 
1931 90 1931—41 142 
1941 "103 1941—51 133 
1951 III 1951—61 21'6 
1961 142 1961—71 24'8 
1971 178 


Age structure. The table below shows the percentage of different 
age groups to the total population according to the 1971 census. 


AGE STRUCTURE 


Аве group Percentage ‹ of the 


total population 
0—14 42°0 


15—19 


87 
20—24 79 
25—29 74 
30—39 12:6 
40—49 93 
50—59 6'1 
60 & above 6'o 


Birth and death: rates. The following table gives the birth and 
death rates per thousand during the five decennia up to 1970. 


BIRTH AND DEATH RATES 


Decade Birth rate Death rate 
Lou ees pag AR PU UNES. Re NET HENS 
1921—30 33 464 26 363 
1931—40 34 45'2] 23 3r2 
1941—50 28 39°6 20 274 
1951—60 3 22 4U7 II 22:8 
1961—79* N.A, 399 N.A. 181 " 


* Density has been Worked out after excluding ulation and area figures 
of Jammu and Kashmir, ка T 


t Provisional, based on Expert Committee Population Projections, 
** Unofficial estimates, 
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б The birth and death rate table clearly shows that there is a big 
difference in the birth and death rate figures based on. the registration 
data (R) and those estimated (E) by the census data. This is explained 


by the fact that many b'rths and deaths go unregistered in the country. 


Life expectancy. Estimates of life expectancy for males and females 
at decennial census since 1901 are give below : 
LIFE EXPECTANCY FOR MALES AND FEMALES 


Expectation 


of life at birth Males Females 
1901—1910 22°59 23°31 
1911—1920 19-421 207911 
1921—1930 26°91 26°56 
1931—1940 32'cof 3137 
1941— 1950 32°45 31°66 
1051—1960 41°90 40°60 
1961—1970* 47°00 45°60 


; The above table shows that there wasa steady though slow increase 
in life expectancy during the successive decades, but it was considerably 
accelerated during 1951-60 and 1961-70. The sharp drop in life expect- 
ancy during 1911-20 was largely due to the influenza epidemic. 


Rural and Urban Population 
‚ Of the 54°79 crore people who constituted the 1971 population of 
India, 43:89 crores or 80 per cent live in villages and 10'91 crores or 
20 per cent in cities and towns. There has been between 1921 and 
1971, a slow but steady shift towards urbanisation as shown below : 
RURAL AND URBAN POPULATION 
Percentage of Total Population 


Years Rural Urban 
1921 888 112 
1931 88'0 12'0 
1941 861 13'9 
1951 82'7 173 
1961 820 18'0 
1971 80° 19'9 


Gaps in Demographic Data. Some of the major gaps іп demo- 
graphic data as revealcd by С.5.0. are : 

т. Birth and death rates separately for the rural and urban popu- 
lation (Annual estimates should be available for each State and each 
district). : 

2. Age-sex-wise distribution of population by State annually. 
3. Annual estimates of infant and maternal mortality state-wise. 
4. Annual estimates of migration from each State and from rural 
to urban area within a State. 
= . Annual estimates of ag 
the State level). 


*Provisional, based on Expert Committee Population Projects. 
+Unofficial estimates. 


e—specific fertility rates (preferably at 
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Errors in Census Statistics 

The task of census taking is a gigantic one as it involves enumera- 
tion of each and every individual. The success of a census primarily 
depends upon the training imparted to the enumerators and the co- 
operation received from the informants. The results of a census are not 
foolproof and the principal sources of error in census statistics are : 


I. Accidental or wilful mis-statements by the individuals 
enumerated. 


2. Carelessness or lack of training on the partof the investigators. 
3. The difficulty of uniform classification. 
4. Errors in age reporting. 


.,, 5. Errors due to lack of complete coverage—there may be certain 
tribes, etc, living in far off places such as mountains who may be 
neglected and thus the population may be underestimated. 


í 6. The possibility of entries on the schedules being wrongly 
inserted, or of the census clerks misreading those entries, and other 
errors of tabulation. 


The accidental mis-statements are attributable either to ignorance 
on the part of the individuals concerned, or to the fact that frequently 
the information has to be obtained from other parties—such as some 
other members of the family, or a boarding house or a hotel-keeper. The 
most important difficulty relates to the correct age information to be 
obtained from the informants. The determination of the age distri- 
bution is one of the primary objectives of almost all population 
censuses. The data on age are essential for the calculation of mortality 
and other rates, for the analysis of the factors of. population changes and 
the preparation of the population estimates for forecasts, for commer- 
cial, actuarial and many other purposes. Accurate investigation of age 
involves the following difficulties : 

(i) Ignorance of correct age. 


. . ii) Carelessness and a tendency for age to be stated in certain 
digits such as zero and five. Persons of age 26 may report as 25 only 
and 29 as 30. 

(iii) Preference of certain figures like 12, 18, 21. The various ages 
determine the right of the people. For example, a person becomes 
major when he attains 18 years of age in India and he is entitled to 
Certain privileges. Now a person whose age is 17 years may be reported 
as 18 years for taking those benefits. A voting right is given to a person 
of 21 and So in order to give vote a person of 20 years may state his age 
аз 21. This may be described as wilful misrepresentation arising from 


motives of an economic, social, political or purely an individual 
character. 


. (iv) Non-co-operation or wilful mis-statement from ladies. The 
ladies generally do not give correct information regarding their age. etc. 
Sometimes Some women will not give any information regarding their 
age and the investigator will have to depend upon his own judgment 
in determining their ages. This may result in considerable errors. 


(v) Illiterate persons. Illiterate persons, generally villagers, do 
not remember the time and date when they were born and so they are 


POPULATION STATISTICS 1-627 


likely to give wrong information. In case of old persons ages are 
seldom correctly given. 


(vi) It has been observed that the number of infants (less than one 
year of age) are generally reported as less. These deficiencies at ages o 
and т appear practically in all the countries including the U.S.A, Two 
explanations of these anomalies at ages o and 1 have been put forward, 
namely, there is a tendency to omit young children altogether from the 
census and there are considerable mis-statements of age in the case of 
those children who are actually enumerated. Generally, the newly 
married couples do not give information regarding children below the 
age of one year. Sometimes due to the presence of certain whims the 
age of the infant is not reported. 


Due to above reasons no correct information of age is obtained. 
Some method should be devised by which we can minimize the error 
resulting from mis-statement regarding age. 

(a) Itis suggested that age reporting may be done in a special 
manner. There should be classification of days, months and years in 
the schedule. The population should be classified by age in terms of 
completed years at the last birthday, i.e., one should not ask ‘what is 
your age’ but ‘what was your age at the last birthday.’ 

(b) It is desirable that data on age at last birthday be tabulated 


for each sex in at least the following age groups : under 1 year ; single 
years from 1 to 4 ; and five-year groups from age 5 to the end of life. 


Where a more detailed tabulation is possible each single year. of 
age may be tabulated separately : by sex. In addition to providing 
information on the number of persons tn Various specialage groups 


which are of interest in connection with studies of school attendance, 


marriage, literacy, economic and social characteristics, etc., this tabula- 


tion will give a useful indication ofthe degree of reliability of data on 
age. 

Persons whose ages are not stated should be tabulated as а sepa- 
rate group, except where their ages have been estimated either by the 
enumerators or respondents in accordance witb special instruction for 
determining the approximate ages in such cases, or by the staff in the 
central census office using careful method of estimating ages from other 


information on census schedules. Where persons with age not stated 


have been allocated to specific age groups by these or other methods, 


the method should be clearly stated in the census report. Grouping of 
3—7, 7—12 may be done in order to get better results. 


Vital Statistics in India 


Vital statistics deal mainly with the registration of births and 


deaths and provide us with a variety of other information all of which 
have their influence on population growth. Registration of vital events 
is essential in a progressive society. All advanced countries gaman 
a highly efficient machinery for the purpose: „As a society develops 
towards greater urbanisation, greater industrialisation, a greater degree 
of migration and greater mobility, vital statistics become essential for a 
wide range of judicial, administrative and personal uses. 
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The.machinery for the collection of births and deaths in India is 
more than hundred years old. In the beginning the Sanitary Commis- 
sioner with the Government of India was compiling these statistics. As 
will be clear from the designation of the officer collecting these statis- 
„tics, the main purpose of these statistics was to exercise control over 
epidemics, famines, etc. For the first time in the year 1873 an attempt 
was made to put the provisions with regard to births and deaths on the 
Statute Book by enacting the Bengal Births and Deaths Registration 
Act. In the year 1886 the Births, Deaths and Marriages Registration 
Act was passed by the Government of India. This Act provided for 
voluntary registration of births and deaths. At present some of our 
Part A States have their own birth and death registration Acts while 
inotherStates such statistics are collected in accordance with the 
laws and by-laws framed by the Municipal and District Boards. These 
rules make registration compulsory. In the rural areas the village 
officials like Patwaris, Chowkidars collect these statistics. 

After independence, urgent need was felt for the availability of 
reliable statistics with regard to births and deaths to help the formula- 
tion of various plans. The task of compiling vital statistics was 
entrusted from the Sanitary Commissioner to the Director General of 
Health Services and the Registrar General of India in 1910. The 
Registrar General is also the ex-officio Census Commissioner. With the 
appointment of Registrar General considerable improvement has taken 
place in regard to the compilation and availability of vital data. Most 
ofthe persons associated with the collection of vital Statistics are 
appointed on a part-time basis. The reason for this is that more than 
eighty per cent of the population of India is rural and is spread over 
a scattered area of several thousand kilometres in more than five and a 
` half lakh villages. The events requiring registration are so few that a 
wholetime agency would be considered as uneconomical]. Information 
is published State-wise and split into rural and urban areas. The main 
heads under which data are collected and published are births, deaths, 
infantile mortality, deaths by causes, death rates, vaccination statistics 
and sickness and mortality of prisoners in jail. 

Shortcomings of Vital Statistics. The position with regard to 
the availability of vital statistics is far from statisfactory and efforts to 
improve these statistics are to be stepped up. The main limitations of 
these statistics are : 


1j Part-time agency. Most of the staff associated with collection 
of vital statistics in India is on a part-time ex-officio basis. Part-time 
officers have been appointed from the Revenue, the Police, the Health 
and the Panchayat Departments on the basis of availability. Such 
employees find little time to attend to the duties connected with collec- 
tion and processing of data pertaining to vital statistics and as such 
the data are half-heartedly collected and cannot be much relied upon. 


2. Delay in publication. There is no whole-time agency upon 
whom reasonable standards of efficiency can be imposed and enforced 
and as such the publication of vital statistics is considerably delayed. 
Most local bodies who are primarily responsible for collecting vital 
data have prescribed time limits running from one day to 14 days with- 
in which the vital event must be reported. But failure to report this 
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in time is not taken seriously and there is no legal provision providing 
for action against those who fail to reporta vital event within the pres- 
cribed time. 


3. Inaccuracy. Vital Statistics in our country are not very 
accurate. Itis because of two reasons: First, there is no legal sanc- 
tion authorising the collection of vital data; and secondly, the officials 
and the persons responsible for supplying the information feel immune 
from action. 


4. Lack of uniformity. There is по uniformity in the procedure 
of registration in various States. It is necessary to collect these statis- 
tics in a well organised and uniform fashion. throughout the country. 
In England the Ministry of Health is responsible for the collection of 
these figures. 


5. Non-co-operation from the masses. People do not take interest 
and fail to realise the significance of vital statistics. In particular, this 
difficulty is badly experienced in rural areas. In urban areas this 
problem exists to a lesser degree. 


Suggestions for Improvement. А number of committees and 
commissions have suggested measures to bring about improvement in 
vita] statistics such as the Central Advisory Board of Health (1939), the 
Health Survey and Development Committee (also known as the Bhor 
Committee of 1946), the Vital Statistics Committee (1949), the Com- 
mittee to Study and Report on patterns of statistical units for Health 
Departments (1960). Some of these suggestions are : 

i. Compilation of vitalstatistics should be done on a uniform 
basis throughout India. The Central Government should pass the 
Indian Vital Statistics Act authorising the collection of data and making 
it obligatory upon individuals to supply the data. 

2. Compilation of vital data and their processing should be 
centralised and mechanised and standard proforma should be used to 
this end. 


3. Training should be imparted to the registration staff, parti- 
cularly at the field, in the compilation of data. 


4. Public should be educated to realise the significance of vital 
ata. 
5. Acertificate of registration must be issued to the informant of 
the family in which the vital event takes place. 


6. The vital statistics should be closely linked with the popula- 
tion statistics. 


In view of the great significance of family planning it would be 
desirable to continue to collect regularly statistics of family planning 
efforts such as the number of family planning clinics, number of sterili- 
sation operations performed, number of loop insertions, distribution of 
the contraceptives, etc. The data should be available not only by States 
and rural urban areas but also by size, classes of household income and 
by other meaningful socio-economic classifications. 
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CENSUS OF INDIA, 1961 
(To be filled up during enumeration) 


Is this an institution ? 


КЕ 


(ii) PART I-HOUSEHOLD SCHEDULE 


Location Code— — — ———-— => 
Full name of Head of household—————— ic 


A. CULTIVATION 


Local name of | Area in 


1. Land under cultivation by Household right on hand acres 
(i) owned or held from Government ——- PUR 
(ii) held from private persons or institutions 
for payment in money, kind or share ——————-— ————|—————- 
(iii) Total of items (i) and (ii) -———————— os 


2. Land given to private persons for cultivation 
for payment in money, kind or share 


B. HOUSEHOLD INDUSTRY 


Household industry (not on the scale of regis- Name of Number of 
tered factory) conducted by the Head of the Industry months 
Household himself and/or mainly members during which 
ofthe household at home or within the conducted 


village in rural areas and only at home in 
urban areas. 


(a) ———————————— 
(b) ————————-———— 
C. WORKERS AT CULTIVATION OR 
HOUSEHOLD INDUSTRY 
Members including Head 
of family working and/ Members of family working Hired 
or hired workers, if any workers 
kept whole time during —— -c 
current orlast working l | | | 
season at Head Other Other Total 


males females 
1. Household cultivation only 
2. Household Industry only 


Both in Household Cultivation 
and Household Industry 
Daté Signature of Supervisor Date Signature of Enumerator 


7 (45—103/77) SM-I—11°77 
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SLIP FOR THE SPECIAL ENUMERATION OF 
SCIENTIFIC AND TECHNICAL PERSONNEL 


CENSUS OF INDIA, 196: : SCIENTIFIC AND TECHNICAL 


Only a person with a recognised Degree or Diploma in Science, Engineering, 
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PERSONNEL 


Technology or medicine should fill in this card. 
Read carefully before filling in. 


I. 


Name 


Tick (у) 
within brackets provided where applicable 


2. Date of birth - 


Designation and office address 


у (if employed) 
Permanent Address 


[Census Location Code} 


(а)Мае( ) 

(b) Female ( ) 

(a) Never married ( ) 
(b) Married ( ) 


On Feb. r, 1961 were you : 
(a) Employed ( ) if so, 
monthly, total income 


(ca e MS 


R 
(b) Full time student 


(c) Unemployed Student 
(d) Unemployed ( )if 
so, how long 
... years... months 


(e) Retired ( ) 


Date 


8. Academic qualification (Answer fully) 


Degree] 


Subjects 
Diploma 


taken 


Division 


Year of 
Passing 


If employed fill in Qs. 9—12. 


9. Nature of employ- rr, 


ment; (a) 
(a) Teaching in School (b) 
(m) (c) 
(b) Teaching in Col- 
Іеве( ) 


(c) Teaching in Indus- 12, 
пу( Т 

(d) Teaching outside (a) 
industry ( ) (b) 

(e) Non-technical (c) 


(d) 
1c. Any Research 
Assignment. (e) 
Yes( ) 
No( ) 


Where employed 
Public Sector ( ) 
Private Sector( ) 
Self-employment 


Sie 
How employed 


Permanent ( ) 
Temporary ( D 
On contract (. ) 
Research Scholar 


ete ( ) 
Otherwise 


Signature 
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10. 


Il. 


12. 


13. 


(iv) 1961 CENSUS SLIP 
(a) Name 
(b) Relationship to Head 
Age (Last Birthday) 
Marital Status 
(a) Birth Place 
(b) Born R/U 
(c) Duration of residence if born elsewhere 
(a) Nationality. 
(b) Religion 
(c) SC/ST 
(a) Literacy 
(b) Education 
(a) Mother-tongue 
(b) Any other language 
Working as Cultivator 
Working as Agricultural Labourer 
Working as Household Industry 
(a) Nature of Work 
(b) Nature of Household Industry 
(c) If Employee 
Doing work other than 8, 9, 10 


(a) Nature of work, 
(b) Nature of Industry, Trade, Profession or Service 


(c) Class of Worker 
(d) Name of Establishment. 


Activity, if not working 


Sex. 


а. 2 
© 


Ў, 


wem cac 
Calcutta 
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CENSUS OF INDIA, 1971 
ESTABLISHMENT SCHEDULE 


- Code No——Name of (Village ог Town—— - 
Name of Taluk/Tehsil/Thana/Anchal/Island——Code No——Name or No. of Ward/Mohalla/Enumerator's Block—Cod No— 


2. 


Date———— 
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Name of District— 


* Serial 
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PadNo] . || 
CONFIDENTIAL’ 3. CENSUS ОЕ INDIA, 1971 
Individual Slip Slip No............. 


HouseholdNo. [ 1 
16. MAIN ACTIVITY 


Location Code. 


2j 

3: É o 

A 5 (i) Worker (C, AL, N.—7 

5. Marital status. о (а) Broad = АІ, ННІ, OW ^? 

6. For currently married women only category (ii) Non-(H,ST, К, / ^w | 
(a) Age at marriage R Worker D, BI, О, | 
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TRY YOURSELF 


1. “Census із not merely the counting of heads but it also gives fund of 
De valuable information." Comment on this statement іп the light of census 
of 1971. 
2. (a) Describe briefly the machinery and procedure for the census of 
population in a country. What precautions are necessary in such operation ? 
(B. Com., Mysore, 1967 ; B. Com., Nagpur, 1969) 
х (b) Explain the importance of Population Census and describe the organi- 
sation for census operations in India with suggestion for improvements. 
(B. Com., Madras, 1975) 


3. Give a brief account ofthe procedure followed in taking the decennial 
population census of India. State the main difficulties that arise during the census. 
(В. Com., Madras, 1972) 

. Howisthe population census conducted in India ? Examine critically 
the information collected at the time of census of 1971. 

. 5. Explain the need and scope of population census. What items of infor- 
mation are generally collected. in population census ? Mention the salient features 
of the 1971 Indian census. What improvements do you suggest for the census 1981 ? 

(В.А. Hons., Econ., Delhi, 1973) 
6. (a) Indicate the nature of information collected іп а population census. 
What were the special features of the 1971 population census in India ? 
(B. Com. Hons., Econ. Delhi, 1968) 
(b) Write an essay on the population census in India. (B. A., Madurai, 1974) 


7. Discuss the special features ofthe 1971 Census of Population of India. 
What suggestions will you offer to make the population statistics more reliable and 
usefu 
8. Discuss the main features of the population statistics available in India 
and give suggestions for improvements. (M. A. Econ.. Punjab, 1972) 
9. “In 1961 census emphasis was laid оп the collection and analysis of im- 
portant basic economic data relating to (a) agriculture, (b) household activity, and 
(c) other economic activities of the individual and the State.” 
Critically examine the above statement. (B. Com., Punjab, 1969) 
10. Discuss briefly the main features of the census held in India in 1971 and 
criticise it from the statistical point of view. Inthe lightof your criticism offer 
suggestions for 1981 census. 
11. Give a critical account of the vital statistics in India. 
(M. Com., Gorakhpur, 1969) 
12. What are the main findings of 1971 Population Census? In what ways 
1971 census is different from 1961 census ? 
13. What are vital statistics ? How are they collected in India and what are 
their shortcomings ? 
р 14. State the diff-rence between the de jure and de facto methods of conduct 
inga census of population. 
Appraise the special features of the Census of Population recently held in 
India. (M. Com,, Delhi, 1971) 
1s. What are the main features ofthe population statistics in India? What 
suggestions would you offer to make them more reliable and useful ? 
(B.A., Madurai, 1975) 
16. (a) Describe the organisational set-up of the. census of population in 
India in 197t. 
(b) Which items of information are usually askedfo r ata population 
census ? Give five examples. 
(c) Which items of information are not asked for at a population census ? 
Give two examples with reason for their exclusion. (B. Com., Bombay, 1976) 


Discuss the important methods of studying populatioa growth and point out 
their shortcomings. (M. Com., Gwalior, 1975) 


Section 


7 Price Statistics 


[CITAS MEE e mu ra ОАА o road nest? smi UU M 


Society and as such price Statistics are regarded as most important 
economic data reflecting changes in the economy of a country. However, 
changes in the Prices of different commodities affect different sections 
of people differently and as Such it becomes necessary to compile diffe- 
rent types of price statistics. Price Statistics relate to (i) wholesale 
prices, and (ii) retail prices. Price statistics available in India fall into 


(A) Price Quotations 
(B) Price Index Numbers. 


Usefulness of Price Statistics 


The price statistics are extremely useful as will be clear from the 
following points : 


employers for adjusting wages and salaries of the employees. When 
Prices rise the employees Put forth a demand for higher wages. In the 
absence of accurate and adequate price data correct decision would be 


2. Price data are needed to control prices of commodities. 
3. Price data are needed in order to evaluate the terms of trade. 


4. Price data reveal the inflationary and deflationary pressures 
and help in taking suitable corrective action. 

5. Price statist 
selected commodities from the very early days of the development of 


The wholesale price statistics are collected by (1) the Office of 
the Economic Adviser, (2) The Directorate of Economics and Statistics, 
Ministry of Agriculture and Irrigation, and (3) The Directorate General 
of Commercial Intelligence and Statistics. Besides, the Statistical 
Bureaus in the States collect wholesale prices of various commodities. 
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. Wholesale price data collected by the office of the Economic 
Adviser can be classified under the following heads : 


1. Price data for the index of wholesale prices. 
2. Daily market rates. 


3. Foreign prices of selected export commodities and raw 
materials. и i 
4. C.I.F. and market prices of selected import commodities. 


Index Number of Wholesale Prices in India 


The wholesale price index numbers are of two types — general pur- 
pose and sensitive. The general purpose index is intended to reflect 
the changes in the general price level and, therefore, it is necessary to 
include as many commodities as possible. On the other hand, ina 
sensitive index only few commodities are included which are likely to 
react quickly to market sentiments. At times the sensitive indices are 
compiled for a particular commodity/commodity group also. 


The Economic Adviser used to publish formerly a sensitive weekly 
index number of wholesale prices based on 23 commodities divided into 
4 groups, namely (т) Food and tobacco, (2) Agricultural commodities, 
(3) Raw materials, and (4) Manufactured articles. The week ending 
I9th August, 1939 was taken as base. In all 46 quotations of various 
items were obtained. The index was computed as the geometric mean 
of the price relatives of 23 commodities with equal weights. 


Some major shortcomings ofthe index were: First, it was an 
unweighted index. Secondly, it did not include important items like 
salt, pulses, etc., which commonly enter the consumption of people. 
Thirdly, some unimportant items like coffee, copra, etc., were included 
Fourtbly, the number of items included in the index was very small. 
Thus, the index was unrepresentative of the true economic conditions 
in the country. It was discontinued in December, 1947. 


In the year 1945, the Economic Adviser’s Office compiled a new 
index for ‘Food Articles’ with the year ending August, 1939 as base. 
The index was compiled as a weighted geometric mean of price rela- 
tives, the weights being proportional to the value of the marketable 
surplus of the various commodities during the year 1938-39. In 1947 
the Economic Adviser’s Office compiled another index, the general 
Purpose index with the year ending August 1939 as base. It covered 
78 commodities divided into 18 sub-groups and 5 groups. Jn all 215 
quotations were collected for these commodities from various centres 
on or about Friday of each week. In computing the sub-group, group 
and all-commodities index numbers resort was had to weighted geome- 
tricmean. With the passage of time the base year became remote, 
Prices came to establish new norms, relative importance of commodities 
in the country’s economy changed and statistics of prices began to be 
collected for many new commodities. Accordingly, this series—which 
started from January, 1947—was discontinued in April, 1960, whena 
Revised Series was issued. The revised index number of wholesale 
Prices compiled by the Office of the Economic Adviser included 112 
commodities comprising 555 individual quotations. The commodities 
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were divided into 20 sub-groups and 5 groups. In the construction of 
the index the Laspeyre’s formula is used. The average used is the arith- 
metic mean. Earlier the Economic Adviser’s index was constructed by 
using weighted geometric mean. The Price relatives from the weekly 
Quotations are calculated as Percentage ratios which the current price 


the sub-group or the group index ; and the weighted arithmetic average 
of these gives the final all-commodities index. Symbolically, the index 
is represented in the following manner : 


Let Рк denote the price of the commodity in the jth week of the 
kth year and Ps, the Corresponding average Price of the ith commodity 
in the base year, and W,,; the weight attached to the jth commodity 
based on estimates of marketed values of domestic produce. The 
general index of the jth week of the kth year (Т) is then : 


Буур 
LINE Jk 
Ir= ZW, 
The indices of wholesale Prices are published in a weekly bulletin 
entitled Index Numbers of Wholesale Prices in India (Revised Series) 
published by tbe office of the Economic Adviser, Ministry of Industrial 


Development, Internal Trade and Company Affairs, Government of 
India. Index numbers are available for each major group, sub-group 


X тоо. 


Ist January, 1977 the base of the index number of wholesale prices has 
been changed to 1972-71 — 100. 


Statistics with regard to retail Prices are more unsatisfactory than 
the wholesale Price statistics. A number of daily, weekly, monthly and 
annual papers and Journals contain retail Prices. The Salt Commissioner 
collects retail Prices of salt which are Published in the Statistical Abs- 

R dia publishes weekly, 
monthly and annually the retail Prices of gold and silver, The Directo- 
tates of Economics and Statistics of the States also publish retail prices 
however, is not the 
€ but the unstandard 


a legal requirement, and the standardisation 
variety of commodities. 


As regards index numbers of retail prices the chief index numbers 
compiled in the country are : 


I. Labour Bureau Index Number of Бегай Prices (Urban 
Areas) The Labour Bureau, Ministry cf Labour, used to compile 
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and publish regularly monthly index numbers of retail prices for 18 
centres in the urban areas. The commodities included were divided in 
three main groups: (i) All articles of food : (ii) Fuel and Lighting, 
and (iii) Miscellaneous. The base year was 1944. These indices which 
were published in the Indian Labour Gazette have been discontinued 
and the Labour Bureau is publishing instead only the price relatives of 
certain selected commodities with 1949 as the base. 


2. Labour Bureau Index Number of Retail prices (Rural 
Centres). The Labour Bureau used to compile and publish regularly 
monthly index numbers of retail prices for 11 centres in the rural areas 
also. The base year was 1944. These indices have also been disconti- 
nued. Now the Labour Bureau is publishing instead only the price 
relatives of certain selected commodities for rural centres with 1960 as 
the base year. 

Labour Bureau Consumer Price Index Numbers 


The Labour Bureau is the chief agency publishing consumer price 
index numbers. There are three different series of consumer price 
index numbers compiled at the all-India level. These are: 


(1) Consumer price index number for industrial workers. 
(2) Consumer price index for non-manual employees. 
(3) Consumer price index for agricultural labour. 


1. Consumer Prices Index for Industrial Workers. Earlier 
the Labour Bureau was compiling an interim series of all-India working 
class consumer price index with 1949 as base. The index was obtained 
asa weighted average of working class consumer price index for 27 
centres out of which 15 were covered by the Labour Bureau and the 
rest by the State Governments. In order to make the index more 
representative a new series with base 1960=100 is being constructed 
for industrial workers. This series is based on the results of the fresh 
family living surveys in 5o industrial centres conducted by the Labour 
Bureau during 1958-59. 


The new series is based on roo items and the index is computed 
on the basis of Laspeyre's formula. The index numbers are published 
regularly since December 1962 by the Labour Bureau in their monthly 
publication Indian Labour Journal. 


2. Consumer Price Index for Non: manual Employees. The 
index number for urban non-manual employees are compiled by the 
CSO. The year 1960 is taken as the base and the index relates to 45 
centres of which r5 are State capitals, 29 are other cities and one Delhi. 
"These indices measure changes in the prices paid by the middle class. 
The index covers 180 items classified into 5 main groups and 23 sub- 
groups. These are: 


I. Food, beverages and tobacco: 
(1) cereals, (2) pulses, (3) fats and oils, (4) fish and eggs, 
(5) milk and milk products, (6) condiments and spices, 
(7) fruits and vegetables, (8) sugar, (9) non-alcoholic 
beverages, (10) prepared meals and refreshments, and (11) 
pan, supari and tobacco. 
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IL Fuel and light. 
ПІ. Housing. 
IV. Clothing, bedding and footwear. 
V. Miscellaneous : 
1. Medical care, (2) Education and reading, (3) Recreation and 


amusement, (4) Transport and communication, (5) Personal 
care and effects, (6) Household requisities and (7) Others. 


Technique of construction. The CSO constructed in 1958-59 a 
middle class family Living Survey at various centres in the States—the 
number of centres in each State being in proportion to the urban popu- 
lation of 1951. About 30,000 families belonging to (i) Professional, 
technical and related classes, (ii) administrative, executive and mana- 
gerial classes, (iii) clerical and related classes, and (iv) sales workers 
class were covered. The survey investigated the conditions and living 
levels of the middle classes in the income range of Rs. тоо to Rs. 750 
per month, deriving a major Part of their income from non-manual, 
non-agricultural sources. The survey assisted in arriving at a weighting 
diagram of the Index. 


Price quotations are collected monthly from Calcutta and Bombay 
(36 shops each), Delhi, New Delhi, Kanpur and Madras and six other 
centres (24 shops each) and from 34 centres (at 12 outlets). 


3. Consumer Price Index for Agricultural Labourers. The 


The Labour Bureau compiles an index for the agricultural labourers 
for all India and for the States, on a monthly basis also. These indices 
are then averaged out on an annual basis also. The year 1960-61 is 
taken as tbe base. The items under the index are divided into four 


Food. 

Fuel and light. 

Clothing, bedding and footwear. 
Miscellaneous. 


Penn 


The index is a weighted one and the Lespeyre’s formula is used. 
The weights are according to the expenditure pattern and have been 
arrived at for the 39 ALE zones. State weights have been obtained 
from the zonal weights. The price quotations are obtained from the 
NSSO. These are based on a field sample of 442 villages spread over 
39 ALE zones. The index is constructed in two stages: first, an index 
for each State is arrived at, and later the all-India Index is constructed 
by weighting the State index according to expenditure in each State on 
Various groups. Both the States and the all-India indices are weighted. 
The weight of an unpriced item is added to an allied item. = 


Consumer Price Index Numbers of Various States 


Various State Governments ате Compiling their own cost of living 
index number and some of them are regularly published in the T ndian 


ee 


PRICE STATISTIC3 1-76 


Labour Gazette. The States are publishing these in their own Gazettes 
or Bulletins. There is no uniformity with regard to the number of 
items and the base period. However, the technique of construction 
follows a general pattern with minor changes. The items selected for 
these indices are almost uniformly classified into the following 5 major 
groups : 

Food. 

Fuel and lighting. 

Clothing. 

House rent. 

Miscellaneous. 

While the groups are mostly common in each case, the items 
covered are widely different from State to State. Amongst the consumer 
price index numbers published by the States special mention may be 
made of the Bombay Working Class Consumer Price Index Number, 
and the Kanpur Working Class Consumer Price Index Nnmber. 


Criticism of the Consumer Price Index Numbers 

1. The coverage of the index numbers is very limited. 

2. The indices are based on the Family Budget enquiries which 
are very old. New family budget enquiries are overdue and should be 
made without further delay. 

/ 3... The State indices are not comparable because of the differences 
in the number of items included and the base year. 


STATISTICS OF AGRICULTURAL PRICES 


Statistics of Agricultural prices help in the proper analysis of 
Progress made by the agricultural sector of India's economy and also in 
the formulation of a suitable agricultural sector of the future. Detailed 
statistics are needed for both wholesale and retail prices of all agricul- 
tural commodities for different types of markets such as primary, 
secondary and terminal markets. The following statistics of agricultural 


Prices are available : 

1. Farm (Harvest) Price Statistics—collected by the State Bank 
of India and by State Governments. 

2. Harvest Season Prices—collected by the Directorate of Econo- 
mics and Statistics (DES). | 

3. Index Numbers of Harvest Price of Principal Crops in India— 
'compiled by the DES. 

4. Statistics of Wholesale Prices of Agricultural Commodity. 


5. Statistics of Procurement Prices. 

The following are the important publications dealing with agri- 
cultural prices : 

1. Bulletin of Agricultural Prices (Weekly). 

2. Agricultural Situation in India (Monthly). 

3. Agricultural Prices in India (Annual). 
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Besides these publications of the DES, the other sources dealing 
with agricultural price statistics are : 


(i) The Index Numbers of Wholesale Prices in India. 
(i) The Indian Trade Journal. 


(iii) The Wholesale and Retail Price Reports brought out by 
different State Governments. 


STATISTICS OF SECURITY PRICES 


Official series of index numbers of security prices were first com» 
piled and issued by the office of the Economic Adviser with the financial 
year 1927-28 as base. These indices covered 150 Scrips and were 
compiled up to the year 1949 when the work of compiling and publishing 
the official Index Numbers of Security Prices was transferred to the 
Reserve Bank of India. The Reserve Bank of India Index Numbers of 
Security Prices were compiled with the calendar year 1938 as base. 
The quotations of scrips were obtained {тот the published lists of 
Bombay, Calcutta and Madras Stock Exchanges. It included 398 scrips, 
which were selected on the basis of the importance of the concern and 
activity of the scrip in the market, The sctips were divided into three 
groups. They were : (i) Government and semi-Government securities, 
(ii) Fixed dividend industrial securities, and (iii) Variable dividend 
industrial securities. The first group was divided into three, the 
second into nine and the third into nineteen sub-groups. Two sets of 
index numbers were used to be Prepared—one, the regional index 
number and the other the all-India Index number. 


, The index number of security prices was revised from July 1957 
with comparison base shifted to 1952 $ 7100. The index covered 512 


bed has also been included in addition to the four centres, i.e., Bombay, 


compilation of the index 578 scrips have been included. The grouping 
| of selected Scrips is broadly on the lines of the Standard Industrial 
Classification Proposed by the CSO. The index is now being construc- 
ted with 1970-71 as base. The Reserve Bank of India also publishes 
the indices of yield on Government and Industrial securities and for 
this purpose the year 1970-71 is taken as base. i 


Limitations of Price Statistics 


Despite the fact that price statistics have considerably improved 


both in respect of quality and quantity, they suffer from a number of 
limitations, important amongst these are: 


т. No information is available about the gaps in prices paid b 
the consumers and the prices charged by the seller and the nd 5 
the various middlemen. 


2. Data relating to the price is collected by the various depart- 
ments of the Government of India and State Governments. There is 
very little co-ordination between them with the result that there is a big 
wastage of resources. 
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.3. We are not collecting data relating to prices paid by the 
cultivators for their farm requirements and the prices which they get 
for their produce. 

4. Several agencies collect quotations of prices and there is very 
little uniformity in the quotations collected by them with the result 
different people draw different conclusions to suit their purpose. 


5. There is no continuity in the collection of prices from period 
to period. Data once collected are allowed to continue with slight 
modifications. 

_ 6: There is considerable delay in the publication of most of the 
Price statistics with the result they are oflittle use to the producers, 
administrators and businessmen. 

7. 'The market centres selected for obtaining the price quotations 
are not satisfactory. 


Suggestions for the Improvement of Price Statistics 


Price statistics are extremely useful in envolving a suitable price 
policy. The Agricultural Prices Enquiry Committee, popularly known 
as the Thapar Committee, made a number of suggestions for the 
improvement of these statistics. Some of these suggestions are : 


1. There is need for greater co-ordination between different 
agencies collecting price statistics at the Centre and State levels. 


2. The commodities—particularly the agricultural commodities 
—need proper co-ordination in the absence of which price data cannot 
be suitably collected. 

3. There is multiplicity of weights and measures in our country. 
Price quotations should be obtained in terms of standard weights and 
measures and where it is not possible the enumerators should be asked 
to collect quotations in terms of local weights and then convert them 
into standard weights. 

4. There is need for greater promptness in the publication of 
Price statistics. The Agricultural Prices Enquiry Committee recom- 
mended the publication of either printed or cyclostyled weekly bulletin 
of prices. It suggested that regional index numbers of prices should 
first be constructed and then these regional indices should be compiled 
to form the all-India index of agricultural prices. 

5. The conceptual differences about various terms should be re- 
Moved. These differences make the data collected by the different 
States uncomparable. : 

6. There are too many publications containing price statistics 
Causing confusion to the reader. As far as possible the data should be 
made available at one place. 

. 7. There is scope for improvement in the method of obtaining 
Price quotations. The main consideration in selecting a centre should 
Dot be convenience of obtaining the price quotations, but whether they 
аге patronized by the section of population to which the index relates 
Or whether they are important market centres of those commodities. 
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Only those quotations should be taken into account which relate to 
model prices, i-e., prices at peak periods of marketing at which most of 
the transactions take place. 


TRY YOURSELF 


1. Critically examine the construction of either the Economic Adviscr's 
Index Number of Wholesale Prices or the Consumer Price Index Number as com- 
piled by the Government of India, (M. Com., Gorakhpvr, 1970) 


2. Examine the adequacy and accuracy of price statistics available in India. 
(B. Com., Nagpur, 1972) 
3. Describe step by step the method of construction of Economic Adviser's 
Index of Wholesale Prices, (B. Gom., Delhi, 1973) 


4. How are Retail Price Index Numbers prepared in India? What are the 
publications relating to them ? (M.A. Econ., Jabalpur, 1974) 


5. How are the Wholesale Price Index Numbers prepared in India? What 
are the publications relating to them ? (M.A. Econ., Jabalpur, 1975) 


, 6. How is Economic Adviser’s Index Number of Wholesale Prices construc- 
ted in India? What suggestions would you like to cffar to make this index more 
reliable and comprehensive ? (M. Com., Gwalior, 1975) 


7. Write a note on the construction of wholesale price index numbers. Illus- 
trate your answer with reference to such an index number available at present in, 
our country. (M. Com., Gwalior, 1976) 


Section } 3 5 alate 
8 National Income Statistics 
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National income. statistics are the most important single set of 
figures which serve the purpose of giving a bird's-eye view of a nation's 
economy. They are absolutely indispensable for formulating suitable 
economic. policies. , Recent researches have considerably improved the 
technique of compilation of these figures and have also explored new 
fields where these figures can be profitably utilized. The compilation 
of social accounts which deals with the flow of money between different 
Sectors of a nation's economy is the latest development in this connec- 
tion. 


What we mean by National Income? It has beena subject of 
debate and discussion and prominent economists like Marshall, Pigou, 
Fisher, etc., have defined national income in different ways. Not going 
into this controversy, the national income may be defined and calculated 
as follows : 

(a) As the net national product, i.e., as the aggregate of the net 
values added in all branches of economic activity during a specified 
period, together with the net income from abroad. 

(b) As the sum of the distributive shares, i.c., as the aggregate of 
income payment accruing to the factors of production in a specified 
Period. These payments take the form of wages and salaries, profits, 
Interest, rents, etc. 

(c) As net national expenditure, i.e., the sum of expenditure on 
final consumption goods and services plus domestic and foreign net 
investments. 


Thus the concept of national income can be viewed from three 
different angles, i.e., production, income and expenditure. All these 
approaches should give us the same final figure of national income. 


Utility of National Income Estimates 


The national income estimates are extremely useful in a number 
of ways. 


1. The detailed estimates of national income throw light on the 
Working of the economy. They show the contribution made by the 
Various branches of industry and trade to the national product. If the 
Contribution of any particular sector is less in comparison to the men 
employed therein one can investigate the causes responsible for this so 
that suitable steps could be taken to increase the share of that sector. 
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2. National income figures are indispensable in economic plan- 
ning. In fact in the absence of detailed estimates of national income 
most of the economic policies would be a leap in the dark. Also a 
balanced economic development will not be possible in the absence of 
national income statistics. 


3. National income estimates throw light on the standard of 
living ofthe people. If national income is increasing other things being 
equal (i.e., price remaining constant, population remaining same, 
distribution of income remaining the same) the standards of living of 
the people are improving. In fact to get an idea as to how standards 
are changing tbe per capita income is computed. 


4. Various Government departments need national income 
statistics to guide their current operations. The Treasury Department, 
for instance, uses them to estimate future tax receipts. The depart- 
ments of labour, trade, industry, transport, etc., obtain a great deal of 
information from national income statistics which help in formulating 
suitable policies. 


5. National income data are employed to evaluate the progress 
and performance of the plans of economic development. A dependable 
guide would be to see if the targeted ratesof growth of national income, 
per capita income, investment, consumption and output have been 
realised. - 


_ 6. A comparison of the rates of growth of national income, per 
capita income, investment, etc., can be made over a period of time and 
also over space. , Inter-sectoral and inter-regional comparisons are of 
great use in assessing existing policies and formulating new ones. 


7. From an economic point of view the inter-regional or inter- 
State variations in the distribution of incomes is very important. For 
balanced economic growth it is necessary that there should be no dis- 
parities in the distribution of incomes as among various regions of States. 
Regional distribution of income gives an idea of the progress of regions 
over a period of time. 

Methods of Estimating National Income 


j There are three popularly known methods of calculating national 
income, namely : 

т. The Income Method. 

2. The Output Method. 

3. The Consumption-Savings Method. 


‘ 1. The Income-Method. This method! derives the national 
income as sum of net incomes received by individuals and business 
enterprises. It relies primarily upon the large volume of data gathered 
by income-tax authorities and sometimes upon special studies relating 
to earnings of various occupational groups and to their family budgets. 
The statistical office of the United States recommended the following 
summary classification of national income by this method : 


(a) Wages and Salaries. 
Private. 
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Government Civilian. 
Military. 

(b) Other labour income. 

(c) Income of unincorporated enterprises. 

(d) Corporate profits before taxes. 

(e) Profits of public enterprises. 

(f) Interests. 

(g) Net rents on lands and houses. 

(h) Net interest and dividends and other income from abroad. 

The sum of the above categories will give national income at 
factor cost. 

In a country like India where the proportion of tax-payers is very 
small this method alone will not be suitable for calculating national 
income because of the inadequacy of coverage. 

2. The Output or the Product Method. According to this 
method national income is calculated by adding the net values of goods 
and services in all branches of economic activity together with the net 
income from abroad. It utilizes the large body of the production and 
trade statistics available in most of the principal countries of the world. 
If the valuation of the output of the goods and services is done at 
market prices, the national income is said to be national income at 
market prices and if the valuation is done at prices which equal the 
Payments received by the various factors of production only, national 
income is said to be national income at factor cost. The statistical 
office of the United States recommended the following summary classi- 
fication of national income by this method : 

(a) Agriculture, forestry, hunting and fishing. { 

(b) Mining and quarrying, manufacturing industry, electricity, 

gas, water and sanitary services. 

(c) Building and construction. 

(d) Transportation and Communication. 

(е) Commerce. 

(f) Banking, Insurance and real estate. 

(а) Government services. 

(h) Miscellaneous services. 

(i) Net income from abroad. 

This method is more widely used in underdeveloped economies 
but one must guard against the possibilities of omissions and duplication 
which are greater in this method. 

Precautions in the use of this method. Great care must be exercised 
while adopting the product method otherwise the estimate of national 
income obtained may be highly misleading. Some points worth consi- 
dering are : 

i. National income should always be expressed with reference to 


particular years and should be stated as at current prices and at cons- 
tant prices separately to avoid confusing the one for the other. 
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2. In order to avoid double counting only the final products 
(excluding газ. materials and intermediate products) should be taken 
into account. 


3. Tbe method should be applied only to such sectors which give 
rise to the production of concrete goods such as mining, manufacturing, 
agriculture, etc.. It should not be applied to the service sector of the 
economy, i.e., transport, commerce and communication. 


3. The Consumption-Savings Method. National income 
according to this method equals the sum of expenditure on final con- 
sumption goods and services plus domestic and foreign net investments. 
Symbolically as Prof. Keynes expressed it : 
Ү=С+5 

where Y— Total Income 
C=Consumption and 
S— Savings. 


This method is used less extensively than the other methods 
because it calls for data not generally available. No country has as yet 
complete; continuous and reliable Series on the volume of consumer 
expenditures and savings. At present the consumption-savings method 


is used asa stop-gap when industrial or income statistics are badly 
lacking. 


Theoretically, all the three methods are related and should give 
identical results. In practice, all the three methods are combined 
because data regarding any one are not perfect. No doubt this. is 
logically defective and. the result cannot be additive and some error is 
introduced, but this error is much smaller than if only one method 

ased on scanty data was employed. 


National Income at Market Price and at Factor Cost 
The following are the common concepts of national income: 
(1) Gross National Product, and 
(2) Net National Product, 


I. Gross National Product (GNP)... Itis the total money value 

of all goods (1.e., ready for consumption goods) and services produced 
y acountry ina year, reckoned at current market prices, including 
epreciation and replacement allowances on capital assets. Itis equal 


GNP may be reckoned at factor cost, i.e., at cost of payments made 
to various factors of production, or it may be reckoned at market Price. 
When GNP is reckoned at factor cost it is egual to СМР at market 
price to which subsidies are added and indirect taxes are deducted. 
Symbolically, 

GNP (Factor sedi d (Market Price) + Subsidies — Indirect 
axes 

GNP (Market Price) - NDP (Factor Cost) ~ Subsidies--Depre- 
ciation+ Allowance +Indirect Taxes. 
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The gross national product at market prices represents prices 
actually paid by the final buyers. They are, therefore, more meaning- 
ful for economic analysis, for the study of trade cycles and for market 
research. Valuation at factor cost represents the amount received by 
producers fortheir products and hence if the purpose is to study the 
allocation of resources, valuation at factor cost is better. 


2. Net National Product (NNP). Net National Product is 
obtained by excluding from the GNP the total value of depreciation or 
replacement allowances. Symbolically, : 

NNP=GNP—D 
where D=depreciation or replacement allowance. Further, NNP 
(Factor Cost)=NDP (Factor Cost)+Earned Income from Abroad, It is 
equal to net national consumption plus net national investment. Sym- 
bolically, 

NNP=NNC+NNI 

NNP may be reckoned at Factor Cost or at market price. When 
NNP is reckoned at Factor Cost it is equal to NNP at market price to 
нен subsidies are added and indirect taxes are deducted. Symboli- 
cally, 

NNP (Factor Cost) - N NP (Market Price)--Subsidies— Indirect 

Taxes. 


NNP at factor cost is also referred to as national income which is 
equal to net domestic product at factor cost plus net earned income 
from abroad (i.e., exports). Symbolically, 

NNP (Factor Cost) or NI=Net domestic product at factor cost+- 

Net earned income from abroad. 

When NNP is reckoned at market price it is equal to NNP at 
factor cost plus indirect taxes and subsidies. Symbolically, 

NNP (Market Price)==NNP (Factor Cost) 4-Indirect Taxes 

— Subsidies-- Net income from Abroad. 

Illustration. The following data relate to the national economy of a country: 


Rs. abja 
1. Net National Output at factor cost 160 
2. Net earned income from abroad uU —2 
3. Depreciation allowances 12 
4. Indirect taxes 82 
5. Subsidies o4 
6. Gross National Consumption 1756 
7. Gross National Investment 68 


Find the following : 

(a) Net Domestic Product at Factor Cost 

(b) GNP at market price 

(c) GNP at factor cost 

(d) NNP at market price 

(e) NNP at factor cost. 

Solution. (a) NNPye=NN output —Net earned income from abroad 
—160—(—2)-— 162 

(b) СМР, = МОР; Subsidies-- Depreciation + Indirect Taxes 

—162—0'44-124-8'2 
=181'8 
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or =NNPy¥,+Depreciation 
—169:84- 12 
=181'8 
(с) GNP 7, —NNP;, 4-Subsidies— Indirect Taxes 
—18r8-c04-82 
=174 ^ 
(d) NNPmp=NNP¥.(=NI)+Indirect taxes—Subsidies— 
Net earned income from abroad 
—160--8'2-0'4—(—2) 
=169`8 
ог =GNPmp— Depreciation 
=181'8—12 
=169:8 
(е) NNP,/—GNP/,- Depreciation 
—174—12 or 162 
or —NNP;;- Indirect taxes +-Subsidies 
=169'8— 824-0'4— 162 
Note : All calculations are in Ёз. abja. 


Estimates of National Income 


No official estimates of national income are available for the period 
before Independence. Whatever estimates had been obtained were the 
result of the pioneering efforts ofsome individuals. ‚Бог the sake of 
convenience, the estimates of national income are divided under two 
heads : 

I. Estimates on national income before Independence. 

2. Estimates of national income after Independence. 


Estimates before Independence. The first attempt to estimate 
the national income of India was made by Dadabhai Naoroji in the year 
1876. He workedout the figure of per capita income* of Rs. 20 for 
the year 1868. Several other persons also attempted to calculate India's 
national income in the nineteenth and twentieth century such as Lord 
Curzon, William Digby, Shah and Khambatta, Findlay Shirras, Dr. 
У.К.К.У. Rao, etc. The estimates of normal income given by 
some of these prominent people are set out in the following table : 


Estimating Authority Year to which the Per capita 
estimate relates income 
Rs. 
1. Dadabhai Naoroji 1867-68 20 
By Lord Curzon 1897-98 30 
3. William Digby  . 1898-99 18'g 
4. Lord Curzon 1901 30 
5. Vakil and Muranjan 1910-14 58's 
6. Wadiaand Joshi 1913-14 445 
7. Shahand Khambatta 1921-22 67 
8. Findlay Shirras 1921-22 116 
9. V.K.R.V. Rao 1925-29 8o 
то. Do 1931-32 65 
п. Ро 1942-43 114 


Total National Income 


* Per capita income- Total Population 
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While going through the above estimates it should be remembered 
that they are not strictly comparable because of the following reasons : 


‚_ 1. Differences in geographical coverage. The coverage was not 
uniform throughout. For example, Findlay Shirras covered the whole 
of undivided India in his first two estimates and included only the 
Undivided British India in his third estimates for the year 1921. 


2. Differences in concepts and procedures employed. The 
methods followed by different authors were different. For example, 
some included the services whereas the others did not. Also different 
authors have been guided by different considerations in making their 
estimates. For example, in the official estimates, non-agricultural 
incomes were assumed to be 40 percent of agricultural incomes. 
Findlay Shirsas put this figure at 50 percent. Wadia and Joshi at 30 
per cent and Shah and Khambatta at as low a figure as 10 per cent only. 
A classical example of this arbitrariness can be seen in the fact that for 
1931-32 whereas Mr. R. C. Desai accepts Rs. 480 crores and Rs. 115 
crores as the contribution of fruits and vegetable and tobacco, Dr. V.K. 
R.V. Rao includes a figure of Rs. 70 crores and Rs. 14 crores respectively. 
Because of arbitrary assumptions of the authors questions have been 
raised about the reliability of these estimates. 


3. Estimates are in current prices. These estimates are not 
strictly comparable because they are in current prices and relate to 
different dates. The price level has undergone a great change during 
this period and no series at constant prices had been obtained. 


4. Unreliability of agricultural statistics. These estimates were 
based on statistics from the agricultural sector which were highly un- 
dependable asthere wasno regular agency for the correction of 
statistics. 


Despite these limitations, credit must be given to the pioneering 
efforts of those individuals who courageously took the task of estimation 
ofnational income given the poor state of economic organisation, a 
virtual absence of organised statistics and an exotic government. 


Amongst the estimates of the various authors, special mention 
must be made of the work of Dr. V.K.R.V. Rao whose study consti- 
tuted the first scientific attempt and aroused the interest of the 
government and eminent scholars in this hithertofore neglected field of 
national income estimation. 


Estimates of National income after Independence. Soon after 
Independence the Government of India appointed the National Income 
Committee on 4th August, 1949 under the chairmanship of Prof. P.C. 
Mahalanobis. Other members of the Committee were Prof. D.R. 
Gadgil and Dr. V.K.R.V. Rao. The Committee was entrusted with 
the task of preparing a report on the national income and related 
estimates, to suggest measures for improving the quality of the avail- 
able data and for the collection of further essential statistics and to 
recommend ways and means of promoting research in the field of 
national income. The committee presented its first report on 15th 
April, 1951 and the final report on the 14th February, 1954. These 
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reports are landmark in the history of this country because for the first 
timethey provided a Comprehensive data of national income for the 
whole of India. Following up on the work of National Income Commit- 
tee, the Central Statistical Organisation of the Government of India star- 
ted producing annual official estimates of national income of the country 
from 1955. The final report of the National Income Committee con- 
cluded with the following paragraph : 


“While we have not been able to accomplish much, we feel that 
we have at least laid down a Proper foundation for the work of national 
income estimation in the country. We hope that rapid strides will now 
be made, and we envisage that, in the near future, accumulated know- 
ledge in the field will be sufficient for planning Purposes or policy 
decisions,” 


The first report of the Committee submitted in April 1, 1951 
laced the estimates of national income for 1948-49 (Net National 
Product at Factor Cost) at Rs 8,710 crores and gave the per capita 


final report contained estimates of national income for the year 1948-49, 
1949 50 and 1950-51 at current prices. 


National Income Committee and CSO Estimates 


i For the post-independence Period, we have two series of national 
income estimates : ; 


1. Conventional Series which provide national income data at 
Current prices and at 1948 49 Prices for the period 1948-49 to 1964-65. 


. 2: Revised Series which provide national income data at current 
Prices and 1966-61 prices for the period 1960-61 and onwards. 


s Conventional Series*, The conventional estimates of national 
Income were prepared by following Product as well as income methods. 
The different methods followed for different s 


have been largely governed by the availability of reliable and adequate 


ata. For the purpose of Preparing the conventional estimates of 
national income, the economy was divided i 


() Agriculture 

(ii) Animal husbandry 

(i) Forestry 

(iv) Mining 

(v). Factory establishments 

(vi) Fishery 

(vit) Small enterprises 
(viii) Organised banking and insurance 


= Байда based on methodology given in the First and Final Reports of 
the National Income Committee аге termed as ‘Conventional’ estimates td distin. 
guish these from the ‘revised ‘estimates released tecently by the CSO 
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(ix) Other commerce and transport 
(x) Profession and liberal arts and domestic service 
(xi) Public authorities 

(xii) House property, and 

(xiii) Balance of payments and net income from abroad. 


The estimates of net product from agriculture, animal husbandry, 
forestry, mining and factory establishment sectors were obtained by 
following the product approach where the gross value of output was 
estimated and from it was deducted the value of various raw materials, 
service inputs and depreciation of assets used in the process of produc- 
tion. For sectors like small enterprises, commerce and transport, 
profession and liberal arts and domestic services, and house property, 
the income approach was adopted which consistin multiplying the 
average net earnings per person by the total estimated number of 
persons engaged in the respective sectors. While estimating the 
average net earnings per person, all factor incomes were taken into 
account. The estimates of national income in respect of banking and 
insurance sector, public authorities, etc., were. based on the analysis of 
the budget documents of the Central and State Governments, Profit 
and Loss Accounts of the Insurance Companies, etc. 


Revised Series. The estimates of net national product at current 
and constant 1948-49 pitice brought out by the Central Statistical Orga- 
nisation till 1966 were generally based on the same concepts, statistical 
source-materialand methodology as described in the first and final 
reports of the National Income Committee. This final report also 
contained various recommendations for the improvement of basic statis- 
tics used in national income estimation, 


For securing a better empirical basis for measuring the national 
product from various industries, fresh efforts were made for a compre- 
hensive collection of available data both published and unpublished. 
The results of the preliminary efforts made in this direction were 
presented in “National Income Statistics— Proposals for a revised series 
of National Income Estimates of 1955-56 to 1959-60" brought out by the 
Central Statistical Organisation in May, 196r. (А number of other 
studies subsequently conducted made it possible to compile the revised 
series of national product at current and constant (1960-61) prices for 
the period 1960-61 to 1964-65. These are presented in the publication 
‘Brochure on Revised Series of National Product* for 1960-61 to 
1964-65’. The revised series is presented in the brochure in the 
abridged form. But it is proposed to bring out soon a comprehensive 
publication setting out the details in respect of the estimates, the 
source material used and the methodology followed for each industry 


group. 
As in the case of conventional series, the revised estimates of 
national product in respect of the following heads have been prepared 


* In the present brochure use has been made of the expression ‘national 
product’ in place of ‘national income’ which means one and the same thing at 
the aggregate level. 
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along with the estimation of working force from individua! industries of 
the economy : 
т. Agriculture, 
2. Forestry and Logging, 
3. Fishing, 
4. Mining and quarrying, 
5. Large-scale manufacturing, 
6. Small-scale manufacturing, 
7. Construction, 
8. Electricity, gas and water supply, 
9. Transport and communication, 
Io. Trade, storage, hotels and restaurants, 
її. Banking and insurance, 
12. Real estate and ownership of dwellings, 
13. Public administration and defence, 
14. Other services, and 
15. External transactions. 


Here also, as in the case of conventional series, the product 
approach has been employed in the commodity producing sectors, i.e., 
agriculture, forestry and logging, fishing, mining and quarrying, large- 
scale manufacturing, whereas for other sectors like small-scale manufa- 
cturing, electricity, gas and water supply, transport and communication, 
trade, storage, hotel and restaurants, realestate and ownership of 
dwelling, the income approach has been employed. For construction 
industry, estimates have been prepared by following both the com- 
modity flow and the expenditure methods. For other sectors, the 


Insurance Companies for Banking and Insurance Sector. Estimates of 


2. Useof the results of the 1961 population census instead of 
Plan Statistics relating to additional employment. 


4. The estimates of national 


n Product at constant (1 60-61) prices 
have been further improved. s ed 


» 


however, relate to (a) the broad sector—agriculture, where the all-India 
ise estimates which 
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are based on the fully revised estimates of out turn of agricultural 
commodities, revised yield rates of minor crops, livestock products and 
agricultural by:products, wider empirical base for data on prices and 
use of latest data on cost deductions; (b) Large-scale manufacturing, 
where the detailed data thrown up by the annual survey of industries 
together with the provisional index ot industrial production with 1960 
as the base have been used; (с) Unorganised sectors like small-scale 
manufacturing, transport other than railway, trade, hotels and restau- 
rants, and other services wherein the working force estimates have been 
obtained from the national sample survey data supplemented by the 
results of other available surveys and studies and the 1961 census of 
population ; (d) Construction where expenditure and commodity flow 
approaches instead of income approach have been adopted; (c) Real 
estate and ownership of dwellings whereas the estimates have been 
prepared on the basis of number of residential huoses reported in 1961 
census of population ; (f£) Public administration and defence, where 
the scope of this industry has been narrowed to the extent practicable 
with a view to exclude all government activities other than adminis- 
trative and regulatory activities. The base year for compiling the cons- 
tant price estimates in the revised series is 1960-61 in place of 1948-49 
adopted for the earlier series now termed as conventional series. This is 
specially due to the fact that 1960-61 is the more recent bench-mark year 
for which maximum data are available.* 


Estimates of National Income 


The latest statistics on national income of India relate to the year 
1974-75. In 1976, the Central Statistical Organisation, Department of 
Statistics, Ministry of Planning, Government of India, brought outa 
brochure entitled “National Accounts Statistics". This publication 
deals with the country’s national accounts for the period 1960-61 to 
1974-75. Apart from presenting the estimates of national product it 
also gives details of consumption, saving and capital formation and 


transactions of the public sector. 


The net national product at current prices in 1974-75 is estimated 
at Rs. 58,137 crores compared to Rs. 49,396 crores tor 1973-74. The 
corresponding per capita income is Rs. 989 in 1974-75 as against Rs. 856 
in1973.74. At current prices the increase in national product is 
177 per cent whereas the per capita income has increased by 15°5 per 
cent. The national product for 1974-75 at constant prices is estimated 
at Rs. 20,183 crores as against Rs. 20,143 crores in 1973 74—a nominal 
increase of o2 per cent. Due to increase in population of about 2 per 
cent during the same period, the per capita income registered a fall of 
17 per cent during this period. It is estimated as Rs. 343 in 1974-75 
and Rs. 349 in 1973-74-* 

Difficulties in the Estimation of India's National Income 


The task of computing national income is a gigantic one and every 
country, howsoever advanced. faces some difficulties in computing 


national income. However, the underdeveloped economies are faced 


. * National Accounts Statistics 196c-61 to 197475, C.S.O., Government of 
India, p. (xxx). 
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with some special difficulties which arise because of the peculiar nature 
of such economies. In case of our country the important problems or 
difficulties that are faced while calculating national income are : 


1. Alarge unorganized and non: monetized sector. While calcu- 
lating national income, the assumption normally made is that the bulk 
of the commodities and services produced are exchanged for money. [п 
case of India, however, a considerable portion of output does not come 
intothe market at all, being either. consumed by the producers them- 
selves or battered for other commodities or Services. The estimates by 
the Reserve Bank suggest that in India the non-monetized sector is 
more than one-third of Indian economy. Since transactions in this 
Sector do not come within the purview of money (there being barter 


market for sale and is consumed by the producers themselves. There is 
no suitable formula by which the output which is not exchanged for 
money may be evaluated in terms of money. If this portion is ignored, 


be formed. Statistics in India, particularly for the unorganised sector, 
are not only inadequate but also very much unreliable. There are no 
statistics . available regarding income, capital investments, consumer 
expenditures, hoarding, customs and excise, domestic employment, 
cottage industries, etc. The main collecting agency in.the village is 
either the Patwarior gram sewak. Both of these are semi-literate and 
untrained in the art of collection of data. Moreover, the collection of 
data is not their primary function. Asa result of it, the statistics 
Collected cannot be very much relied upon. Аз pointed out by the 
National Income Com ittee, “The relative dearth of material, both 
statistical and analytical in the national income field, in India, is part of. 
the vicious circle characteristic of an undeveloped economy, poverty 
leading to perpetuation of poverty." It is heartening to note that after 


mation on the various aspects of the Indian economy. Several State 
Governments have set up Statistical Bureaux, Statistics of production 
and output on which national income estimates are based are now 
available in a much better form in India, However, much remains to 
be done in this respect. 


: 3. Absence of accounting records. The problem of measurability 
is further aggravated by the fact that Most producers. do not generally 
maintain any reliable records either of the quantity or of the value of 
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their output. Most of these producers carry on production on family 
level or run household enterprises ona very small scale. Illiteracy, 
lack of training, and the general absence of the practice of keeping 
accounts prevents a vast majority of cultivators and small household 
producers to keep and maintain proper accounts of their respective 
activity. Not only this, people in our country are not yet alive to the 
needs of accurate statistics. The result is they are not in a mood to 
supply information. They do not co-operate with the schemes of the 
Government in collecting data and are even hostile to them. The 
quality of statistics is, therefore, adversely affected. Thus peculiar 
psychology of people is an additional difficulty in the calculation of 
national income. Commenting on the non-availability of data the 
National Income Committee pointed out : “Ап element of guesswork 
inevitably enters the assessment of output, especially in the large sectors 
of the economy which are dominated by the small producer or the house- 
hold enterprise.” 


4. Regional diversities. Diversities in different parts of the 
country are so great that data which are collected ina particular region 
cannot be used to draw conclusions regarding any other region or 
regarding the whole of economy. Our country is like a sub-continent 
and there are gre it diversities in the climatic conditions, tastes, habits, 
customs and environments of the people of the different States. Just 
as statistics collected in Japan cannot be used to draw conclusions about 
Canada, similarly statistics collected in Maharashtra cannot be utilized 
for drawing conclusions about Bihar or U.P. or any other part of the 
country. The great heterogeneity in different parts of the country 
makes the collection of statistics necessary for the whole of the country. 
Even the use of sampling techniques has to be made with great caution. 
То quote National Income Committee Report, “Regional diversities in 
India with its size and varied history are large; and inadequacy of 
data cannot easily be overcome by extending data for one part of the 
country to the rest of the country." 


5. Overlapping of economic functions. A major difficulty in the 
calculation of national income in our country is the lack of clear-cut 
line of demarcation between different occupations. Most of the people 
in our country particularly in the rural areas follow more than one 
Occupation and it becomes difficult to allocate their income to different 
Occupations. This difficulty arises because of the fact that agriculture 
in India is seasonal and as such during the off-season the agriculturist 
engages himself in a number of pursuits such as barber, carpenter, 
labourer or else. he often migrates to the town and works as a domestic 
servant, as a rickshaw-ouller, hawker or an industrial labourer. Asa 
result it is not possible to arrive at a suitable occupational classification 
and the computation of national income by industrial origin becomes 
very difficult. 

Recent Improvements in National Incomes Estimates 


It is heartening to note tha: serious efforts are being made to remove 
these difficulties. Periodically a conference on national income is held 
to consider the various methodological improvements in the calculation 
of national income in India. The advice of the experts from India and 
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abroad is being sought on various problems. In this way a great head- 
way is being made in making these estimates about India more realistic 
апа accurate. Amongst the important steps taken up by the Govern- 
ment to improve the national income statistics, special mention may be 
made of the following: 


1. Setting up of the National Income Committee in 1949. This 
isa landmark in the history of National Income statistics of our 
country. 


2. The creation of the National Income Unit with the Central 
Statistical Organisation with the responsibility of compiling national 
. income data and bringing about continuous improvements in them. 


3. With a view to fillin the statistical gaps a number of new 
studies have been undertaken both by the CSO and other institutions. 
Important amongst these are : Estimates of gtoss capital formation in 
India for 1948-49 to 1960 61, Preparation of the ‘Input-Output’ table 
by the Indian Statistical Institute, monograph on ‘Savings in India 
1948-49 to 1957 58’ by the NCAER, studies оп savings and finances of. 
public and private limited companies, estimates of net tangible wealth 
of India (1949-50 to 1960-61 and 1960-61 {о 1965 66. See RBI Bulletin 
Oct. 1972) by the Reserve Bank of India. 


4. In order to reduce the time-lag of ro to 12 months between 
the expiry of the financial year and. the publication ofthe white paper 
on national income, the CSO had started issuing quick estimates from 
the year 1958-59 onwards. The latest ‘quick estimates’ relating to the 
year 1972-73 have been recently released. 


5. The CSO has set up a working group on State Incomes and 
efforts in this direction have improved the coverage, regularity and 
accuracy of the work done by the State Statistical Bureaux 


6. From 1961 onwards, budget documents of public authorities 
have been used to provide Not only the ‘actuals’ but also the ‘revised 
estimates’ so that up-to-date information is provided by the National 


year's weighted all-India average prices on the basis of the Economic 
Adviser's wholesale price index number of agricultural commodities. 


dual businesses to keep records of their transactions, so it is most 


D place in the 
national economy as a whole. These records are called social accounts. 


D 
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"These accounts provide information about the structure end functioning 
ofthe economy. If appropriate classifications are made, information 
can be derived about the annual income of the nation, how it is produ- 
ced, distributed and spent, how the wealth ofthe nation is being built 
up, and so on. Such information provides a basis for national economic 
policy, it helps governments in their attempts to maintain economic 
stability and prosperity, and to ensure an efficient distribution of 
economic resources and a balanced growth of the economy. Not only 
this, empirical investigation of the working of an economy depends on 
the availability of data about aggregates of transactions of the kind 
recorded in social accounts. 


It was after the publication of Lord Keynes, “Тһе General Theory 
of Employment, Interest and Money," in 1936 that social accounting 
received attention, although its recent development is associated with 
the name of Mr. J.R.N. Stone and Prof. Meade. The General Theory 
of Employment, Interest and Money provides the basis for the develop- 
ment of technique of social accounting as it sets out relationships 
between various aggregates in such a way as to give impetus to social 
accounting studies. 


The income and the product methods of measuring national 
income enable us to study the wealth and income of the people over a 
period of time and to compare it with the wealth and incomes of the 
other communities of the world. But social accounts are kept on a sys- 
tem of double entry book-keeping showing receipts and disbursements 
under each head of account, presented in the form of graphical equa- 
tions derived from inflow and outflow matrix, or in the form of a table. 
As beautifully pointed out by Edey and Peacock : 

“Sozial Accounting, embracing national income accounting, is 
concerned with the statistical classification of the activities of human 
beings and human institutions in ways which help us to understand the 
operation of the economy as a whole. It embraces not only the classi- 
fication of economic activity, but also the application of the information 
thus assembled to the investigation of the operation of the economy 
system.” 

Social accounts are so drawn as to bring out the distinction 
between (i) forms of economic activity, i.e., production or consumption 
or accumulation of wealth, (ii) sectors or institutional sub-division of 
the economy, i e-, private or public sector, and (iii) types of transactions, 
i.e., sales, purchases, gifts, taxes, etc. 


Uses of the Social Accounting Framework 

Economists have always been interested in aggregates such as 
national income but the systematic treatment of aggregated transaction 
in a social accounting framework dates only from about 1940. The 
following are the uses of the social accounting system : 

1. [t enables the structure of economic transactions to be set out 
ina consistent way and makes clear the dependence of the definition of 
any given aggregates on the particular system chosen. It helps to 
elucidate the relations between concepts, i.e., the distinction between 
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gross national product at market prices and net national product at 
factor cost. It readily reveals the effects of any change in the treatment 
of particular items on the various aggregates of transactions. 


2. Social accounts facilitate the actual estimation of the aggre- 
gates of transactions. They indicate zlternative routes to the estimation 
of a given concept, t.e., the estimation of gross national product by the 
production, expenditure and income received method. Any discrepancies 
and errors are revealed and a basis is provided for making adjustments 
of discrepancies. 


3. Social accounts provide a framework for the classification of 
transactions and hence suggest the form in which. data should be 
collected. Also movements in gross national Product valued at constant 
Prices and expressed per head of the Population give an indication of 
movements in the standard of living Similarly movements in gross 
national product, valued at constant prices and expressed per head of 
the working population, give an indication of movements in the level 
of productivity. 

4., The social accounts give a Picture ex-post of the outcome of 
economic activity. Ex-post measurements relate to events which have 
actually taken place. They are a record of what has happened. They 
can alo be used as a framework for drawing up an ex-ante forecast of 
the likely outcome of the economy in the future. Ex-ante measurements 
relate to expectations, intentions, Plans, forecasts, etc. 


5. The social accounts method gives a better understanding of 
the state of nation’s economy. its functioning, its break-up and how the 


Social Ac:ounting in India 


; In our country we don't have adequate Statistics to have a mean- 
ingful System of social accounts. The National Income Committee as 
early as in 1948 Pointed out that “Social accounting as a method of 


The NIC adopted a classification of 
following these basic sectors М 


(i) Enterprise, 
(ii) Household, and 
(її) Government. 


the Indian economy into the 
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Each of the three sectors were further sub-divided into four heads 
as follows : à 

(a) Production, 

(b) Consumption, 

(c) Addition to wealth, and . 

(d) Social Accounts. 

The entire framework of social accounts, therefore, consists of 
twelve accounts in all comprising three major sectors of the economy 


classified further into four sub-classes. The method of presentation in 
a tabular form is as follows : 


FRAMEWORK FOR SOCIAL ACCOUNTS 


Sectors of Economic Activities 

Accounting Enterprises Household Government 
Production 
Consumption 


Addition to wealth 
External accounts 


ACCOUNT 1: DOMESTIC PRODUCT 


rri Gross domestic product at factor 1'4 Private consumption expenditure 


cost (279) (41) 
1'2 Indirect taxes (5'7) rs Central Government consumption 
r3 Less subsidies (52) exp. (571) 
1'6 Gross domestic fixed capital forma- 
tion (3'1) - 


1' 7 Increase in stocks (372) 
r8 Exports of goods and services (6'1) 


Expenditure on gross domestic pro- 
duct and imports 
rg Lessimports of goods and services 


(673) 
Gross domestic and product Expenditure on gross 
at market prices domestic product 


ACCOUNT 2: NATIONAL INCOME 


2'1 Compensation to employees (4'5) 2'9 ох гено product at factor 
cost (r'1 
2'2 Income from farms, professions 2'to Net factor income payments from 
and other unincorporated enter- the rest of the world (6 2) 
prises (4'6) 211 Less provisions for the consump- 
23 Income from property (47) tion of fixed capital (3*3) 


2*4 Saving of corporations (3:4) 

2*5 Direct taxes on corporations (58) 

26 General government income from 
property and entrepreneurship 


(58) i 
27 Less interest on the public debt 


56) 7 
2'8 Less interest on consumer's debt 
(4°8) 


National income Net national product at 
factor cost 
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3'1 Gross domestic fixed capital forma- 33 Provisions for the consumption of. 
tion (1 6) fixed capital (2711) h 
3'2 Increase in stocks (177) 3'4 Savings of corporations (24) 

3:5 Net capital transfers from house- 
holds and private non-profit insti- 
tutions (4°11) 

3'6 Net capital transfers from general 
government (3°11) 

3°7 Net International transfers recei- 
ved by corporations (6:6) 

3'8 Net borrowings of corporations— 
(4°14 plus 5"15 plus 69) 


Gross domestic capital Finance of gross domestic 
formation capital formation 


ACCOUNT 4: HOUSEHOLDS AND PRIVATE NON-PROFIT 
INSTITUTIONS 
Current Account 


41 Consumption expenditure (1*4) 
42 Direct taxes (59) 
4'3 Other current transfers to gene- 


45 Compensation of employees (271) 
46 Income from farms, professions 
and other unincorporated enter- 


ral government (5'10) prises (2:2) 
4°4 Savings (4°12) 4'7 Income from property (273) 
4'8 Less interest on consumers’ debt 
(278) 


4'9 Current transfers from general 
government (5*3) 


Disposal of income Income of households and private 


non-profit institutions 


Capital Reconciliation Account 


410 Net capital transfers to general 412 Savings (4'4) 
government (5'13) 


411 Net capital transfers to domestic 413 Net international transfers recei- 
capital formation (3*5) ved (67) 


414 M borrowing—(3°8 plus 5'15 plus 
`9) 


Disbursements 


Receipts 


ACCOUNT 5: GENERAL GOVERNMENT 
2 


Current Account 
5'1 Consumption expenditure (1'5) 5'5 Income from Property and entre- 
ie preneurship (2:6) 

5'2 Subsidies (3) 5:6 TUN interest on the public debt 
2:7) 

5'3 Current transfers to household 
(479) 

5'4 Savings (s*12) 


577 Indirect taxes (12) 


5'8 Direct taxes on corporations (2*4) 
5'9 Direct taxes on houscholds (472) 
5'10 Other current. transfers from 
households (4'3) 


Disposal cf current revenue Current revenue 
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Capital Reconciliation Account 
511 Net capital transfers to domestic 5'12 Savings (5*4) 
capital formation (3*6) 5'13 Net capital transfers from house- 
holds (4'10) 
5'14 Net international transfers received 
(68) 
515 Net borrowings (3'8 plus 4'14 plus 
679) 


Disbursements Receipts 


ACCOUNT 6: EXTERNAL TRANSACTIONS 
(REST OF THE WORLD ACCOUNT) 


Current Account 


6'1 Exports of goods and services (1'8) 6'3 Imports of goods and services (1*9) 
6'2 Netfactorincome payments tothe 6'4 Surplus of the nation on current 


nation (2:10) account (6'5) 
— 
Current receipts from abroad Disposal of current receipts 
from abroad 
Capital Reconciliation Account 
6'5 Surplus of the nation on current 6'9 Net lending to the rest of the world 


account (6*4) (78 p'us 4°14 plus 575) 
6'6 Net international transfers to cor- 

porations (3*7) 
6'7 Net international transfers to 

households (4°13) 
68 Net international transfers to 

general government (5°14) 


Receipts Disbursements 


Commenting on the relationships among the entries on the 
accounts the U.N. Series F. No. 2 Rev. 1 gives the following deri- 
vations : 


(i) National Income—Net domestic product at factor cost plus net 
factor income from the rest of the world. 


(ii) National Income— Consumption expenditure plus net domestic 
capital formation plus net export plus net 
factor income from the rest of the world 
less indirect taxes net of subsidies. 


(iii) National Income— Consumption expenditure plus net domestic 
capital formation plus surplus of the nation 
on current account less net current trans- 
fers from the rest of the world less indirect 
taxes net of subsidies. 


n" As is evident in order to follow this set of Standard Accounts we 
- * will have to do a lot with regard to the availability of reliable data on 
many of the aspects on which no data or very little data at present are 
available. 


xc 
B 
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SUGGESTED READINGS 


National Income Committee Report, Government of India, Volumes I, & II. 

National Income and Social Accounting, Edey and Peaccck. 

Applied Statistics for Economists, Karmel, Ch. 11. 

D of National Accounts and Supporting Tables, United Nations, Series 
. No. 2 

National Income of India : Trends and Structure by M. Mukherjee. 

National Accounts Statistics, 1960-61—1972-73, C.S.O. (Jan, 1975). 


TRY YOURSELF 


1, Write a detailed note on national income statistics in India. 
(B, Com., Punjab, 1973) 


2. Explain the concept of *National Income' and describe bricfly the official 


method of estimating the National Income of India. (В. Com., Punjab, 1970) 
У 3. Definenational income. Discuss briefly the various methods followed for 
its calculation in India. (B. Com., Nagpur, 1971) 


4. Discuss the limitations of national income figures as indicators of econo- 

fhic growth. Illustrate your answer with reference to Indian figures, 
(М.А. Econ., Punjab, 1972) 
5. Describe the main difficulties in the Collection of data pertaining to 
National Income in India, What are your suggestions for solving these difficulties ? 
(M. Com., Gorakhpur, 1972) 


.. 6, What is National Income ? What statistical methods are employed in its 
estimation? Bring out clearly the difficulties in the estimation of National Income 


of India. (B. Com., Bangalore, 1973) 

7. Write a critical note on the method used in estimating India’s National 
Income. (M. Com., Gorakhpur, 1974) 
x 8. Explain the method currently followed in India for estimating national 
income. (B. Com., B.H U., 1973) 


7 9. Discuss the important metkods used in the calculation of national 
"Income. Mention briefly the difficulties in the calculation of national income in 
India, (B.A., Bombay, 1974) 

10. Point out how social accounting System of measuring national income 


is ab improvement over other methods. Why has it not become popular in our 
оџпігу 


7 п. Explain the method of estimating national product in India, What are 
its shortcomings ? Suggest improvements, (M.Com., Delhi, 1971) 


12. Discuss the main features of national income accounting in India. 
(M. Com., Punjab, 1975) 
" 13. Briefly describe the technique of constructing socia] accounts and discuss 
its uses, (M. Com., Delhi, 1975) 


14. Critically examine the technique by which national income is estimated 
at present in our country. (M. Com., Gwalior, 1976) 


Section 


9 Financial Statistics 


Financial statistics are considered to be one of the most important 
economic indicators. Broadly speaking, they are divided under the 
following three heads : 


А. Statistics relating to Banking, Currency, Exchange and Bullion. 
B. Statistics relating to Public Finance. 
C. Statistics of Insurance Companies and Financial Institutions; 


A. STATISTICS RELATING TO BANKINC, CURRENCY, 
EXCHANGE AND BULLION 


The Reserve Bank of India happens to be the most important 
agency publishing statistics relating to Banking, Currency and Ex- 
change. The following are the main publications of the Reserve Bank 
containing these statistics : 


1. Reserve Bank of India Bulletin. This is a monthly publication 
of the Research Department, Reserve Bank of India. This happens to be 
the leading source of data on banking, money and credit. Itcontainsa 
large number of tables (more than 50) pertaining to the various aspects 
of finance, currency, banking, exchange, etc. 


2. Report of the Trend and Progress of Banking in India. This is 
an annual publication of the Reserve Bank. It contains a review of 
major developments in the banking field during the calender year 
(January-December). This publication contains statistics, amongst 
other things, on the liabilities and assets of the Reserve Bank, consoli- 
dated position of the scheduled banks, analysis of investment of banks, 
interest and money rates, velocity of circulation of deposit money, 
cheque clearing, etc. This report also contains a very useful table 
giving the trend and progress of banking in India at a glance. 

3. Statistical Tables Relating to Banks in India. This is also an 
annual publication of the Reserve Bank. This table gives statistics of 
not only scheduled banks but also of non-scheduled banks whether 
registered in India or abroad. It is divided into three parts: 


(i) Summary tables—they give the more important items of 
assets and liabilities and assets of the several classes of banks. 

(i) Detailed tables—particulars regarding individual banks are 
given in the detailed tables. 


(ii) Appendices—they contain information on the location of 
offces of banks and the centre-wise details of bank deposits and bank 
credit in India. 
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4. Report on Currency and Finance. This is an annual publica- 
ton of the Reserve Bank. A large variety of authentic statistics 
pertaining to banking, currency and bullion are available in this 
publication. The report has two parts : 


(i) Part I gives overall trends in the economy during the financial 
year. It covers bedgetary operations, trends in money supply, invest- 
ment finance for agriculture, advances to agriculture, small-scale and 
other sectors, balance of payments, external assistance, international 
monetary situation, etc. 


(ii) Part II deals in detail with the developments in the various 
sectors of the economy with a separate chapter for each of these such 
as output and price trends, selective credit controls and other banking 
developments, rural credit, capital market, budgets and public debt, 
trade, tariffs and exchange control, world trade and payments, currency 
and coinage, etc. 


Besides these two parts, a number of appendices are also given 
Covering various important issues. 


5. Statement of Affairs of the Reserve Bank of India. Thisisa 
weckly statement issued by the Reserve Bank and is divided in two 
parts giving separate figures of assets and liabilities of the Banking and 
Issue Departments of the Reserve Bark. 


6. Statement of Affairs of the Scheduled Banks. "This is issued every 
week by the Reserve Bank. It provides a consolidated statement about 
the position of the scheduled banks. Statistics available relate to time 
liabilities, demand liabilities, bills discounted in India, total cash 
balances with the Reserve Bank, etc. It relates to their position as at 
the close of each Friday. To facilitate comparison figures are given for 
past three weeks. 


. 7 Statistical Tables Relating to the Co operative Movement in 
India. This is an annual publication of the Reserve Bank. The 
information Published relates to Provincial and central co-operative 
banks and credit societies, land mortgage banks and other type of co- 
Operative societies. 


qns Statistical, Abstract of the Indian Union. This is an annual 
publication and besides statistics and other aspects it also contains very 
useful statistics pertaining to banks, 


Critical Review of the Banking Statistics 


By and large, banking statistics are quite satisfactory in our 
country. This is for the reagon that the Reserve Bank is primarily 
responsible for publishing these statistics. However, banking statistics 
suffer from a few limitations. Hor example, no statistics worth the 
name are available as to the non-scheduled banks or such other small 
banks like the. indigenous banks which carry on banking business but 
do not come within the purview of the Banking Companies Act. Also 
no information is available with regard to the purpose for which the 
advances are made. This is a very Necessary information as it enables 
us to know whether more funds are being directed to trade, industries, 
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transport, agriculture, etc. Similarly figures relating to secured advances 
are classified according to the typesof securities. It would be more 
useful if figures relating to unsecured advances are also given, however 
nominal they may be. 


B. PUBLIC FINANCE STATISTICS 


Public finance statistics, broadly speaking, cover the following: 
(i) Statistics pertaining to finances of the Central and State 
Governments. 
(ii) The balance of payments statistics, 
(iii) The income-tax statistics, and 
(iv) Public debt statistics. 


Statistics pertaining to Finance of the Central and State Govern- 
ments 
These statistics fall under the following four heads : 


(a) Revenue. 

(b) Capital. 

(с) Debt, and 
(d) Remittances. 


Statistics relating to the above heads are compiled from the office 
of the Comptroller and Auditor-General of India and also by the CSO. 
They are available in the annual Budgets as well as in other publications 
of the Government. The Economic Division of the Ministry of Finance 
brings out every year а document entitled “Ап Economic Classification 
of the Central Budget". The classification follows the technique of 
social accounting and groups together like items after eliminating all 


accounting transactions. 


Balance cf Payments Statistics 


The International Monetary Fund (I.M.F.) defines balance of pay- 
ment as “а systematic record of all economic transactions during a 
certain period between residents of the reporting country and the 
residents of other countries referred to for convenience as foreigners.” 
The I.M.F. has been trying to introduce uniformity in the compilation 
and presentation of balance of payments statistics, It has accepted a 
standard schedule which all member-countries have to adopt. 


In India, the Reserve Валк, through its Department of Research 
and Statistics, has been comp ling and publishing balance of payments 
statistics since 1948 according to the following schedule introduced by 


the LM.F. : 


(1) Imports C.I.F. 

(2) Exports F.O.B. 

(3) Trace balance (2-1) 
(4) Official donations 

(s) Other invisibles (net) 
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(6) Current balance of payments (net) (3+4+ 5) 
(7) Errors and omissions 
(8) Official loans (gross) 
(9) Other capital transactions (net) 
(то) Drawings on I.M.F. (net) 
(11) Draft on foreign exchange reserves 
(12) Current balance of payments (total of 7 to 11). 


Information is available separately about : 
(i) Sterling area, 

(ii) Dollar area, ` 

(iii) O.E.E.C. countries, and 

(iv) Rest of non-sterling area. 


The Reserve Bank of India brought out a bulletin entitled “India’s 
Balance of Payments from 1948-49 to 1955-56” in January 1957. The 
booklet furnishes a detailed account of statistics relating to India's 
balance of payments for the above Period. The statistics pertaining to 
the above period and 1955-56 onwards are available in the monthly 
bulletins issued by the Reserve Bank. 

Income-tax Statistics 

Income-tax statistics serve many useful purposes. For example, 
they show the direction of economy and indicate the level of administra- 
tive agency by providing estimates of evasion, realisation, outstandings, 
etc. Income-tax statistics are regularly published by the Central Board 
of Revenue in a publication entitled, “All-India Income-Tax Revenue 
Statistics”, giving a number of statements. Some of the statements are 
as follows : 

(i) Income-tax assessments by types of assessees 


. (ii) Distribution of income according to source and type of 
income. 


(ii) Distribution of income-tax according to source of income. 
(iv) Distribution of income of individuals. 


(v) Distribution of assessees, income and taxation of individuals 
according to grades and sources of income. 


Public Debt Statistics* 


The developing economies in order to meet their expenses have to 
borrow for short and long periods. Detailed statistical information 
regarding debt position of the Central Government and also that of State 


: *Public debt comprises (i) permanent debt, i.e. loans raised in the market 
in India as well as long-term securities issued to the Reserve Dank of India in 
conversion of ad hoc Treasury bills, (ii) floating debt, namely, Treasury bills and 
special floating loans representing the liability assumed in respect of special rupee 
securities (non-negotiable and non-interest bearing) issued in payment of India’s 
B A as contribution towards the share 
capital to the Asian Development Bank, and (iii) external debt. 


—Report on Currency & Finance, 1969-70, p. 165. 
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Governments is available through official documents. The debt obliga- 
tions of the Central Government are classified into : (1) Permanent debt, 
and (ii) Floating debt. Both of these are again classified as those raised 
in India and outside. Rupee loans of the Indian Government are further 
classified according to maturity period and foreign debt according to 
currency and country. State Governments’ debt position is classified 
accordingly. Statistics pertaining to public debt are published by the 
Reserve Bank. 


C. STATISTICS OF INSURANCE COMPANIES AND 
FINANCIAL INSTITUTIONS 


A large variety of statistics relating to insurance companies are 
available. They are published in a number of publications by govern- 
ment, semi-government and private organisations. The most important 
of these is the Indian Insurance Year Bock. This is an annual official 
publication of the Controller of Insurance, Government of India. The 
Year Book is a complete account of various aspects of insurance business 
in India. It gives an exhaustive list of insurers doing various kinds of 
insurance business in India, together with details of new business and 
an analysis of their activities. 


Since independence a number of financial institutions have been 
set up to meet the financial requirements of the industries. Important 
amongst these are Industrial Finance Corporation, Industrial Credit and 
Investment Corporation, Industrial Development Bank, National Small 
Industries Corporation, etc. Besides these in each State there is a State 
Financial Corporation. Statistics pertaining to these corporations can 
be obtained from their annual reports. 


SUGGESTED READINGS 
Reserve Bank of India Bulletin (Monthly). 
Supplement to the Reserve Bank of India Bulletin. 
Report on Currency and Finance (Annual). 
Statistical Tables Relating to Banks in India (Annual). 
Trend and Progress of Banking in India (Annual). 
All India Rural Debt and Investment Survey, 1961-62. 
Statistical Statement Relating to Co-operative Movement in India (Annual) 


TRY YOURSELF 


1. What are financial statistics? How are these collected in India ? 
(М.А. Econ. Jabalpur, 1974) 


2. Comment on the adequacy and reliability of financial statistics in India 
and suggest methods of improvement. 


Section 


10 | National Sample Survey 


болыска ү MN T MR 


Nead for National Sample Survey 


The absence of reliable statistics relating to production, consump- 
tion and other aspects of economic social life i i 
for a long time. Since 1947 the development of statistics has, therefore, 
been а continuing concern of the Government of India. In 1948 at the 


to the Cabinet and Director of the Indian Sfatistical Institute (ISI), 
Calcutta. In the year 1950, a National Sample Survey Organisation 
was set up in the Department of Economic Affairs, Ministry of Finance. 
The setting up of this organisation was a pi i 


The surveys are arranged in a number of successive rounds per 
year, each round comprising a period of 3 months or more. 


The sampling method has severa] advantages over census method 
such as greater adaptability, speed and economy. It is also more 
scientific than a census enquiry. In consequence the general experience 
all over the world has been that a Properly conducted sample survey 
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information at short intervals. From practical considerations, the 
choice is between organising sample surveys and having no information 
at all. The use of random sampling method is thus the only common- 
sense way to collect information about a rural economy as a whole 
which is made up of the activities and the ways of living of roughly 
6 crores of individual households scattered over the vast area of 2'03 
million sq. km. 


The information is collected mainly by the interview method in 
which the investigators visit each household included in the sample and 
make direct enquiries from the householders. In the case of crops and 
certain other items, the investigators collect the data by their own 
direct observations. The investigators are employed ona whole-time 
basis and work throughout the year ; in addition there is a whole-time 
inspecting, supervising and auxiliary staff. 

The technical and statistical work (including the design of the 
survey, processing of the data and the writing of the report) was being 
done by the Indian Statistical Institute in Calcutta, prior to the forma- 
tion of NSSO. 


Functions 


The Directorate of NSS worked under the Statistics Department 
of the Cabinet Secretariat. Its main functions were : 


1, Collection of socio-economic data relating to demographic 
conditions on a continuous basis in a comprehensive manner for the 
whole country. A major objective of NSS has been to provide data 
needed for filling up gaps in information required for national income 
estimation by the CSO. 


2. Collection of data relating to the organised industrial sector 
of the country. 


3. Supervision of the surveys conducted by States in agricultural 
sector through their own agencies and also giving guidance to States for 
analysis and co-ordinating the results of these surveys. 


The programme of data collection by NSS was done by rounds of 
surveys. The period of a round generally coincides with the agricul- 
tural year. 


In conformity with the above three functions of the NSS Direc- 
torate, its activities were entrusted to three main divisions as follows : 
1. The Multi-purpose Socio-Economic Statistics Division. 

2. The Industrial Statistics. 
3. The Agricultural Statistics Division. 


The NSS Directorate was assisted by the CSO, the ISI and the 
State Governments in its work. The CSO drew up survey designs and 
tabulation programmes and fixed up priorities besides co-ordinating 
the different aspects of surveys. The 151 handled the work relating to 
preparation of designs, programming drawing up of schedules, proces- 
sing, tabulation, analysis of data including report writing. The State 
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Governments participated in NSS programmes on a matching basis and 
collaborated in training programmes and other matters of mutual 
interest. The Directors of Statistics Department of various States were 
designated as Officers on Special Duty under the Chief Director of 
Industrial Statistics. The NSS was further assisted in its functions by 
the NSS Programme Advisory Committee. The Committee advised 
the Government on the question of relative priorities to be adopted 
for various items in the various rounds of the NSS and also on the 
tabulation of data. The Committee comprised experts from the 
Planning Commission, the Ministries of the Union Government 
(Finance, Industry, Food and Agriculture, Labour and Employ- 
ment). the CSO, the ISI, and participating States (i.c., all States except 
West Bengal). 


The First Round of Survey 


The first round of survey started in October, 1950 and was 
completed in March, 1951. In this roundasample of 1,833 villages 
(out of a total of about 5,60,600 villages for the whole of India) was 
selected for investigation. The sample villages were scattered all over 
the country and some were located in areas difficult of access, For 
example, there were villages in the wide areas of Orissa such a Kala- 
handi where the investigators had to go through more than 20 miles of 
wild forests under the protection of armed guards. For another sample 
village in Northern U.P., an investigator had to wait for the snow to 
melt to go over the high passes of the Himalayas, 


The sample was divided into two groups of villages each of which 
was scattered throughout the country, and two different sets of schedu- 
les were used in the two groups. One set (which was prepared in the 
Indian Statistical Institute) was employed for collecting information in 
the first group of 1,189 villages and the second set of the schedules (pre- 
pared by the Gokhale Institute of Politics and Economics) was used in 
the second group of 644 villages. The technique of selection of villages 
was quite instructive and interesting. The whole country was divided 
into 250 geographical strata representing conditions of various econo- 
mic, social and regional orders. In each stratum the number of villages 


was so fixed that it was divisible by 3 to that they may be divided into 
two groups in a ratio of 2: 1. 


The schedules used in the first group of 1,189 villages were of 
the following four types : 


1. Village Schedules which were used for listing all households of 
a sample village, for collecting information on land utilisation, and for 
collecting prices of selected commodities, such as cereals, pulses, oils, 


vegetables, etc., and rates of daily wages of various types of skilled and 
unskilled workers. 


_ 2. Household Schedules (First Set) were used for collecting general 
particulars on demographic and economic conditions su-h as age, sex, 
marital status, economic and employment status of the members of 


households as well as information on the holding and use of land under 
various categories, 
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3. Household Schedules (Second Set). Another set of household 
schedules for a smaller number of sample households was used for 
collecting detailed information on household enterprises and activities 
such as agriculture and enimal husbandry, industry, crafts, trade, 
services and profession. 


4. Household Schedules (Third Set). Another set of household 
schedules for a smaller number of sample housebolds was used for 
collecting detailed information on household enterprises and activities 
suchas agriculture and animal husbandry, crafts, trade, services and 


profession. 


5. Household Schedules (Fourth Set). Another set of household 
schedules for a small number of sample households was used for collect- 
ing datailed information on consumption in value and wherever possible 
in quantities of food, beverages, fuel and light, rent, clothing and 


various other items. 


The schedules used in the second group of as on 161 villages were 
prepared at the Poona Institute. These schedules also covered com- 
prehensively the demographic and economic characteristics of the 
households but with somewhat lesser details. There was, however, an 
important difference in regard to the period of time for which the parti- 
culars were to be collected. Most of the information collected in the 
schedules drawn up by the Indian Statistical Institute related to the 
one-year period—]uly 1949 to June 1950 The reference period in the 
schedules prepared by the Poona Institute was shorter and was mostly 
either a month or a week preceding the date of enquiry depending upon 
the nature of the item. The use of these two different sets 
of schedules had this advantage that in the very first round of the 
survey, it was possible to obtain all-India data for two different periods, 
first, for the period July 1949 to June 195», and secondly, for the period 
September 1950 to February 1951. 


For the field work, India was divided into 16 parts, each called a 
block. A block consisted either of single large State like tbe Uttar 
Pradesh, or a number of smaller States grouped together, as for 
example, Assam, Tripura and Manipur. The 1,833 villages were then 
assigned the different blocks in proportion to their 1941 populations. 
Each of these blocks was further sub-divided into smaller areas (not 
necessarily of equal size) in such a way that a multiple of three villages 
(usually six villages) could be assigned to different areas keeping the 
population proportion the same. Ап investigator was given a group of 
six such villages to survey out of which four villages would have the 
Calcutta Schedules and the other two villages the Poona Schedules. 


So far 30 rounds have been completed and the 31st round is in 
progress. A number of reports have been published numbering more 
than 260. Each report contains at theend a complete list of the type 
of information collected in various rounds with code number so that 
there is no difficulty in referring to the particular issue with which the 
reader is concerned. The following are some of the publications of the 


NSS : 
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Serial No. Pubiications 
1. General Report No. т cn ths First Round (October 1950- 

March 1951). 

20. Reporton Pattern of Consumer Expenditure (2nd to 7th 
Rounds : April 1951-March 1954). 

26.’ A Preliminary Report on Housing Condition (7th Round: 
October 1953-March 1954). 

34. Tables with Notes on Employment and Unemployment 
(Tenth Round : December 1955-May 1956). 

136. Tables with Notes on Capital Formation (Urban) (7th 
Round : September 1961-July 1962). 

170. Table with Notes on Housing Condition (18th Round: 
February 1963-January 1964). 

171. Some Results relating to Construction of Pucca Houses in 
Rural and Urban Areas of India (22nd Round : July 1967- 
June 1968). 

176. Some Results of the Land Utilization Survey and Crop 
Cutting Experiments (22nd Round: July 1967-June 1968). 

177. Vital Rates in India (19th Round: July 1964-June 1965). 

178. Tables with Notes on Annual Survey of Industry-1965, 
Sample Sector, Detailed Results. 

180. Tables with Notes on Fertility and Mortality Rates in 
Urban Areas of India (16th Round : July 1960-August 1951). 

181. Tables with Notes on Urban Labour Force (21st Round: 
July 1966—June 1967). 

182. Tables with Notes on Internal Migration (18th Round : 
February 1953— January 1964). 


. Ad Hoc Surveys 


Besides the regular rounds, ad hoc surveys have been carried out 
by the NSS in collaboration with the Central Ministries concerned. 
Some of the ad hoc surveys conducted by the NSS are : 


I, A survey of housing conditions for Ministry of Works, Housing 
and Supply. 

2. Asurvey of the habit of newspoper reading for the Press 
Commission, Ministry of Information and Broadcasting. 


3. Survey of employment seekers for Delhi Em;lyment Ex- 
changes. 
4. Family budget enquiries in 50 factories, mines and plantation 


centres for construction of consum:r price indices by the M nistry of 
Labour. 


| „5, Land cultivation survey for the Ministry of Food and 
Agriculture. 


Besides the various rounds and ad hoc surveys, the NSS Director- 
ate collected a variety of other data. For example, since 1959 the 
Annual Survey of Industries was being conducted by the NSS. In the 
sphere of agricultural statistics, the NSS Directorate's primary function 
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was to offer technical guidance to the States to assist them in the plan- 
ning and organisation of States series of crop estimates of the chief food 
and non-food crops. The NSS also helped in the development of 
uniform concepts, definitions and other details of crop forecasts and 
co-ordinated the results received from various States. 


National Sample Survey Organisation (NSSO) 


The National Sample Survey (NSS) was instituted in 1950 to 
conduct sampling enquiries for collecting socio-economic data on a 
countrywide basis. In 1970 a National Sample Survey Organisation 
(NSSO) was created in terms of the Government of India Resolution of 
March 5, 1970 and the Directorate of National Sample Survey was 
made a part of the National Sample Survey Organisation and was 
renamed as the Field Operations Division (FOD). The NSSO is the 
central local agency for collection and independent checking of data 
required for purposes of Central Planning and National Income estima- 
tion. It offers an established source of statistical information on many 
important subjects, which are useful to the Government as well as 
research workers in connection with planning and national develop- 
ment. Toensure that the collection, processing and publication of 
data is free from undue influence, the activities of №50 are governed 
by an independent Government Council consisting of a non-official 
chairman, four non-official and five official economists and statisticians 
and four Directors of the functional divisions of NSSO as members, 
with Chief Executive Officer as its Member-Secretary. The council 
has full authority to formulate its shost-term and long-term pro- 
grammes. 


Criteria for the Assessment of Results 


Sometimes a question is asked as to what the criteria are by which 
the reliability of the results obtained can be judged. There are three 
broad approaches which, of course, are complementary and must be 
used jointly and simultaneously. 


First, the great scientific merit of the sampling method is that in 
a properly designed sample survey, it is possible to calculate the margin 
of error of the results from the sample data themselves. Another 
possibility is to carry out special ‘quality’ checks by highly trained and 
experienced workers (including senior statisticians) who would them- 
selves go out to the fields and directly collect some of the critical 
primary data. By comparing the results of such quality checks with 
those based on the data collected Ьу the regular investigators in the 
usual way, itis possible to get a good idea of the reliability of the 
general information. Та a continuing sample survey like the NSSO it 
is further possible to conduct type investigations for intensive study of 
particular problems by technically qualified workers. 

АП the above methods are internalin the sense that information 
would be collected by the NSSO itself in many different ways (inter- 
penetrating sub-samples, quality checks, type studies, etc.), with a view 
to improving the reliability of the results. 
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А second broad approach is to use external checks by comparing 
the NSSO results with data obtained from entirely independent sources. 
Such checks are of great value. At times test items are included in 
the NSSO schedules with the deliberate intention of using such infor- 
mation for purposes of comparison with data obtained from indepen- 
dent sources. 


In the case of all scientific investigation there exists a final 
criterion, namely, the reliability of forecasts made on the basis of 
present knowledge. 


The validity of the NSSO results, thus, will have to be assessed 
by piecing together a large mass of evidence based partly on internal 
consistency, partly on external checks, and ultimately on the accuracy 
and social usefulness of the estimates and forecasts. The different 
strands of evidence cannot be expected to be all-concordant. Some of 
the results may be contradictory. However, it does not nullify the 
usefulness of the results because the theory of probability demands that 
forecasts made on the basis of sampling must sometimes prove wrong. 
There can be no denying the fact that the NSSO is rendering a very 
useful job in filling up the statistical gaps that existed on various 
aspects of the economy in our country. The National Sample Survey 
Organisation has opened branch units at numerous places and 
their staffis in the neighbourhood of 2,000 trained persons. In the 
years to come it can be confidently said that the NSSO will be able to 
provide a lot of useful data on almost all aspects of the economy. Such 
data will not only be useful in planning but shall also be of great 
value to the research workers and research institutions in fulfilling 
their aims. 


An Appraisal of the Work done by NSSO 


The NSSO has done commendable work since its inception. It 
has collected large volume of reliable dataon most of the important 
items relating to economic, socialand demographic characteristics of 
our people in the rural and the urban areas. Their work is all the more 
praiseworthy in view of the great difficulties in their way. The low level 
of literacy, the feeling of suspicion resulting from ignorance, the indiffe- 
терсе of the common man, etc., are their chief difficulties. However, 
there is scope for further improvement. The following suggestions are 
made to enhance the utility of the work done by NSSO : 


1. The scope of the enquiry should be widened by NSSO. 

2. Very often the questionnaires used are confusing and cumber- 
some, The informantsare reluctant to give the information. There 
is an urgent need for simplifying the questionnaire. 

3. The relevant periodof these surveys has changed from round 
to round, making comparison difficult. It is suggested that some 
uniformity be maintained in this regard. 

..^ There is considerable delay in the publication of information 
which very much reduces the significance of the results obtained. For 
example, in May 1971, the findings of 16th round (July, 1960—August 
1961) have been released (S.N. 180) entitled “Notes on Fertility and 
Mortality Rates in Urban Area of India", 
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Р Efforts must be made to minimise the dalay in the publication of 
information so that its timely use could be made. 


TRY YOURSELF 
т. Write a brief critical note on the aims and achievements of the National 


Sample Survey. (M.A. Econ. Jabalpur, 1974) 
2. Discuss briefly the design and organisation of the National Sample 
Survey and indicate some of its broad results. (М.А. Econ. Luchnow, 1972) 
3. Write a critical note on the work done by the National Sample Survey 
Organisation. (B. Com. Punjab, 1972) 
4. Givea brief account of the design of the survey adopted for the first 
round of the NSS. What were its main findings ? (B. Com. Bombay, 1972) 
s. Givea brief but analytical account of the work done by the NSSO in 
our country. (М. Com. Agra, 1973) 


6. Discuss the functions and working of National Sample Survey Organisa- 
tion in India. (M. Com. Gwalior, 1976) 


Section | Critical Appraisal of 


11 | Indian Statistics 
EEUU USE рена. 


In the previous chapters a brief account of the nature, sources and 
limitations of statistics pertaining to some important economic problems 
of our country has been given. There could be no denying the fact 
that the position with regard to the availability of data in respect of 
Scope, coverage, reliability, etc., has considerably improved after inde- 
pendence because of the various steps taken by the Government to 
improve the position. These steps are : 


I. Setting up of the CSO. 

2. Settting up of the National Sample Survey Organisation. 
3. Setting up of the Directorate of Economic and Statistics. 
4. Setting up of the Office of the Registrar General. 

5. Setting up of the State Statistical Bureau. 

6. Attaching Statistical Units to most of the Ministries. 


Despite the various steps taken by the Government our statistics 


suffer from а number of limitations. The important amongst these 
are : 


1. Limited coverage. The coverage of our statistics is very poor. 
There are many important fields even today on which either no statis- 
tics are available or they are very meagre. For example, in respect of 
small-scale and cottage industries, consumption, employment and un- 
employment, construction activity, etc., sufficient data are not available, 

n the case of organised industries out of 63 industries as classified by 
the Census of Manufactures, only 28 industries have been covered so far. 
Similarly agricultural statistics do not cover the entire land area and 


whatever land arex is covered, field-to-field coverage has not been 
Possible, 


2. Delay in publication. In some cases there is considerable 
delay in the publication of the results and by the time figures are pub- 
lished they become quite obsolete and out-of-date and much of their 
usefulness is lost. To give a few instances there is a time-gap of about 

уе years in the publication of the different volumes of Annual Survey 
of Industries. „Тһе third census after independence was carried out in 
I971 but surprisingly complete information pertaining to the census is 
Dot yet available. То give another instance Crop estimates are publi- 
shed even later than the arrival of the сгор їп the market and thus 
serve no useful purpose. Of course, there is bound to be some timelag 
between collection of statistics and their publication but effort must be 
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made to minimise this time-gap so that the published information is £ 
use to those for whom it is meant. Quite often the delay in publication 
ison account ofthe factthat the informants have not supplied the 
information in time. In some cases a certain time period is prescribed 
within which the information is to be supplied failing which a fine is to 
belevied. In practice, fines, etc., are often not realised even in case of 
undue delays. 


3. Inaccuracy of data. In respect of certain statistics, doubts 
have been expressed with regard to their reliability. For example, 
statistics of employment and unemployment, crop output, etc., are not 
regarded as very reliable. The chief cause of unreliability of statistics 
is the paucity of trained investigators who fail to obtain correct infor- 
mation, For example, the agricultural statistics are collected by Pat- 
waris or Lekhpals who are not technically trained for the job. Not 
only this, very often the work of collection of data is taken to be very 
disinteresting and a routine job. In some cases the persons responsible 
forobtaining the information are part-time workers or just a very 
nominal honorarium is paid to them. For example, the census work is 
carried out by thousands of investigators who are just given training for 
a very short time and a very small honorarium is paid to them. The 
inaccuracy is also due to the indifference of the informants who very 
often furnish inaccurate information for one reeson or the other. 


4. Data not strictly comparable. In some cases there is no uni- 
formity in the method of collection, classification, definition of statisti- 
cal units, etc., with the result that data аге not strictly comparable. 
Inter-State comparisons are possible only when there is uniformity with 
regard to the method of collection, concepts, definitions, etc. In some 
cases more than one organisation is collecting statistics оп the same 
aspect and the figures collected by them differ because of differences in 
procedures, definitions, etc. For example, agricultural statistics 
published by the Food and Agriculture Ministry and the Reserve Bank 
of India show differences. Similarly unemployment statistics published 
by the NSS and the census office show wide differences. 


5. Lack of proper analysis. Some statistics in India are collected 
to suit the administrative needs and are, therefore, not properly analy- 
sed and processed. Since the statistics do not speak themselves their 
utility can be very much enhanced if a critical appraisal of the available 
data is carried out. 


6. Lack of clarity. Till recently one important limitation of our 
statistics was that the published data were not self-explanatory and their 
scope, significance and methods of compilation were not fully known. 
This limitation has been greatly removed by the publication of the 
“Guide to Current Official Statistics" by the Economic Adviser to the 
Government of India under the Ministry of Commerce and Industry. 
The CSO publishes guides for different types of statistics in the country. 
The State Governments are also publishing separate guides for statistics 
of individual commodities like cotton, jute, oilseeds, textiles, etc. 
Except for a few cases most of the journals and magazines that provide 
statistical information contain the necessary footnotes, source notes, 
etc., to clarify any points in the data published. 
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7. Lack of proper co-ordination. Last but not the least in some 
cases our statistics lack proper co-ordination and there is overlapping 
and duplication in the process of collection which results in waste of 
time, money and energy and cause confusion. Though with the setting 
up of CSO, the problem of co-ordination has been solved to a large 
extent but still there is need for greater co-ordination between Central 
and State Governments’ Statistical Organisations. 


Efforts are being made to solve the above problems. Before some 
Suggestions are made for improving our statistics, it will be worth- 
while to examime the main difficulties that come in the way of collec- 
tion of statistics in India, 


Difficulties in Collection of Statistics in India 


I. Most of the people in India are illiterate and they do not 
understand the utility of statistics. Because of this they act ina very 
indifferent manner to the statistical enquiries. They also do not keep 
any records of age, income, etc. 


2. Most of the people look upon statistical enquiries with 
dubious eyes. They feel that the statistics supplied shall be used to 
their disadvantage and for this reason they do not supply correct 
information. This is particularly so in case of income, production, 
sales, consumption statistics, etc. 


3. There are too many regional diversities and different languages 
are spoken in different parts ofthe country which complicate the task 
of data collection. 


4 The country isspread overa very wide area and, therefore, 
enquiry of an all-India level becomes very costly and this restricts the 
Scope of data collection. 


Suggestions for Improvement 


‚ The following аге some suggestions for the improvement of the 
Statistical material available in India : 


„| There should be greater coverage and those fields on which 
either no Statistics or very meagre statistics are available should be 
brought in the purview. 


2. The rural area should be given greater attention than what 
has been accorded to them so far. 


3. Proper training should be imparted to those people who are 
entrusted with the task of collection of data and a sample check on 
their work should be exercised. 


4. There should be greater co-ordination between Central and 
State departments and also with outside agencies. 


, 5 In order to facilitate comparison, there should be standardiza- 
tion of statistical definitions and methods, 


3 6. Greater consciousness should be created in the public regard- 
ingthe utility of statistics and they should be convinced that the 
statistics supplied by them sball not be used to their disadvantage. 
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7. The delay in publication of statistics should be avoided and 
those responsible for delay should be subjected to fines which 
should be more effectively realised. 


8. Wherever questionnaire method is used, it should contain some 
questions so drafted that а cross-check at least in respect of vital 
questions could be exercised. 


9. Greater publicity of the available statistics is necessary to 
enhance their utility. Wherever possible the collected data should be 
presented in the form of diagrams and graph so that the facts presented 
become more interesting to the reader. 


то. To ensure greater clarity of the published data not only notes 
and footnotes are necessary but also details about the method of 
collection, the margin of error, the limitationsof the collected statistics 
should be given. 


TRY YOURSELF 
1. Critically examine any two of the following statements : 
(i) “The main charge against Indian statistics is that they are 
inadequate." 
fii) “Indian statistics are unco-ordinated.” 
(iii) ‘The coverage of Indian statistics is poor." (В. Com., Punjab, 1072) 
2. Describe the general shortcomings of Indian statistics emerging from 
official sources. How can the quality of these statistics be improved ? 
(M. Com. Gorakhhpur, 1974) 
“Indian statistics are gravely inexact, unnecessarily diffused, incomplete 
and misleading.” 


Comment on the statement and discuss the measures taken by the Govern- 
ment since independence to improve the quality and quantum of Indian statistics, 


Miscellaneous Questions 
ne т. Describe the organisation and working of any two of the following in 
ndia: 
(i) CSO 
(ii) NSSO 
(iii Population Census. (М.А. Econ., Lnchnow, 1972) 
2. Describe briefly the functions of CSO їп India. How far has it been 


successful in setting up standards and norms and in bringing about co-ordination of 
statistics in the country ? (M.A. Econ., Lucknow, 1973) 


3. In respect of any three of the following six items give in each case (i) the 
name of a publication which you would look up for latest available statistics, (ii) the 
periodicity for the publication, (iii) the main features of the publication, and (iv) any 
suggestions you may have for its improvement : 

(a) Area and yield of principal crops in India. 

(b) Volume of internal trade of India. 

(c) Production of manufacturing industries in India. 
(d) Occupational distribution of population in India, 
(e) Value of exports of India. 


(f) Rural under-employment in any State in India. 
(M.A. Econ., Lucknow, 1974) 


4. Write short notes on any four of the following : 
(a) Indices of agricultural production. 
(b) Statistics of industrial disputes. 
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(с) CSO. 
(d) NSSO. 
(e) Crop cutting experiments, 
(f) Accounts relating to Foreign Trade and Navigation of India. 
(B. Com., Punjab, 1974) 
5. Write short notes on any two of the following : 
(a) Annual Survey of Industries, 
(b) Banking Statistics in India. 
(c) Growth of National Income during 1951-65. А 
(В. Com., Punjab, 1972) 
6. Comment: ‘The newly revised agricultural output index of India is 
constructed andhas wider coverage but it has completely lost its continuity with 
the old series,” (B.A. Hons., Econs., Delhi, 1970) 
7. Link relatives are based on the idea that one series can be converted into 
another because time-reversibility holds. Do you agree? Discuss the current 
industrial output index of India in this context. (B.A. Hons., Econ., Delhi, 1970) 
8. What are the important sources of industrial statistics in India? с 
(B. Com., Delhi, 1970) 
,9. Write notes onthe information available in India about any two of the 
following : 
(i) Agricultural Statistics, 
(й) Vital Statistics, 
(ii) Labour Statistics, (B.Com., Bombay, 1971) 


‚ 10. Write a brief note on the statistical information available in the following 
publications : 


(i) Indian Census Report of 1961, 
(Gi) The Indian Trade Journal, : (В. Com., Bombay, 1973) 


IL. Write a brief note on the Statistical information available in any two of 
the following publications : 


li) Monthly Statistics of Foreign Trade of India, 
(ii) Reserve Bank of India Bulletin. 
(iii) Labour Gazette (Maharashtra State) (B. Com., Bombay, 1973) 


12. What improvements have been effected since independence to fill up 
the gaps and improve the reliability of official statistics in India? What role has the 
CSO played in this direction 7 (B. Com., Pnnjab, 1975) 


13. "Statistics in India are neither complete nor realiable." Assert "the 
correctness of this statement. (M.A., Econ. Jabalpur, 1974) 


14. What qualitative and quantitative improvements have been made in 
economic statistics after Independence ? (М.А: Econ. Jabalpur, 1975) 
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1 Probability 


The word ‘probability’ or ‘chance’ is very commonly used in day- 
to-day conversation. For example, we come across statements like 
“Probably it may rain tomorrow” ; “it is likely that Mr. Х may not come 
for taking his class today" ; "the chances ofteams А and B winning a 
certain match are equal" ; “probably you are right" ; “it is possible that 
I may not be able to join you at the tea party". All these terms 
—possible, probable, likely, etc.—convey the same sense, i.e., that the 
event is not certain to take place or, in other words, there is uncertainty 
about happening of the event in question. In layman's terminology the 
word ‘probability’ thus connotes that there is uncertainty about what 
has happened in the present. However, in mathematics and statistics 
we try to present conditions under which we can make sensible numerical 
statements about uncertainty and apply certain methods of calculating 
numerical values of probabilities and expectations. In statistical sense 
the term probability is thus established by definition and is not connected 
with beliefs or any form of wishful thinking. 

Historical Development 

The development of the theory of probability dates back to the 
seventeenth century. It had its origin in the problems dealing with games 
of chance. Games of chance, as the name implies, include such action 
as tossing a coin, throwing a die, drawing a card from a pack, etc., in 
which the outcome of a trial is uncertain. In 1654 Antoine Gornband 
Chevalier de Mere, a French gentleman with an interest in mathe- 
matics, called upon the French mathematician Pascal for the solution of a 
particular gambling problem. Subsequently, there appeared various 
works of Huygens (1657), J. Bernouli (1713), De Moivre (1718), and Bayes 
(1764) most of whom were concerned with the application of the theory 
of permutations and combinations to the calculation of probabilities 
associated with various dice and card games, 


Starting with games of chance, ‘probability’ today has become one 
of the fundamental tools of Statistics. In fact, Statistics and probability 
are so fundamentally interrelated that it is difficult to discuss Statistics 
without an understanding of the meaning of probability. A knowledge 
of probability theory makes it possible to interpret statistical results, 
since many statistical procedures involve conclusions based on samples 
which are always affected by random variation, and it is by means of pro- 
bability theory that we can express numerically the inevitable uncertainties 
in the resulting conclusions. 

DEFINITION OF PROBABILITY 

The term ‘probability’ is difficult to define and there is no general 
agreement about the meaning of the term and many people appreciate 
probability and chance with nebulous and mystic ideas. However, broadly 
speaking, there are three different schools of thought on the concept of 
probability. 

(i) Classical or a priori Probability 

The classical approach to probability is the earliest and easiest. This 

School of thought assumes that all possible outcomes of an experiment 
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are mutually exclusive and equally likely. The words “equally likely” 
convey the notion of “equally probable" to the classicists, that is, to them 
each outcome of an experiment has the same chance of appearing as any 
other. 

The definition of probability given by Laplace and generally adopted 
by disciples of the classical school runs as follows: Probability,* it is 
said, is the ratio of the number of “favourable” cases to the total number 
of equally likely cases. If probability is denoted by p, then by this 
definition, we have : 


Number of favourable cases 
P= Total number of equally likely cases ` 
For calculating probability we have thus to find out two things : 
1. Number of favourable cases. 
2, Total number of equally likely cases. 
For е, if a coin is tossed, there are two equally likely results, 


а head or a tail, hence the probability of a head is 3. 

Similarly, if a die is thrown, the probability of obtaining an even 
number is $ or ё since three of the six equally possible results are even 
numbers, í 

Symbolically, if an event А can happen in ‘a’ ways out of a total 
of ‘n’ equally likely and mutually exclusive ways then the probability of 
occurrence of the event (called its success)f is denoted by : 


P=Pr (4) 5. 


та и obability of non-occurrence of the event (called its failure) is 


ў qe Pr (Not A) or E: 


=1—* or 1—P or 1—Pr (4). 


Since the sum of the successful and uns 1 i | to the 
оаа of Hes, wont uccessful outcomes is equal to 


a+b=n 
* Dividing by n, 
a,b 
he: =] 
so that р+9=1. 


* It is better to call ita measure of lit 
un M Probability rather than аз pd of the 

f It should 5e understood that the words "successful" “unsuccessful” are 
used in a neutral sense, (Alternatively, we might say that p% Ms in. the ‘a’ group 
only outcomes marked by the presence of a perty, in the ‘b’ group outcomes 


certain pro 
ed by the absence of that property; Bat it will be convenient to use the traditional 
Е.С. Mills : Statistical Methods, p. 141. 


terms.) 


PROBABILITY A3 


Probability, therefore, may be written as a ratio. The numerator 
ofthe fraction corresponding to this ratio represents the number of 
successful (or unsuccessful) outcomes, while the denominator represents 
the total number of possible outcomes. 

The scale of probability extends from zero to unity (Le., one). 
When p=0 it denotes impossibility of the event taking place, ie., the 
event cannot take place. However, this is true only when the number of 
possible outcomes is finite. For ср the probability of throwing seven 
with a single die is zero. On the other hand, when p=1 it denotes cer- 
tainty, i.e., the event is bound to take place. In most cases in practical 
life the probability lies between these two extremes 0 and 1. 

Illustration 1, From a bag containing 10 black and 20 white balls, a ball is 
drawn at random, What is the probability that it is the black ball ? 

Solution, Total number of balls in the bag 10--20230 

Number of black balls =10 

Probability of getting a black ball or 


Number of favourable cases a 
p otal number of equally likely cases er n © 
10 
7-3 
Probability of not getting a black ball or 
20... 2 
3v 90 8 
l2 
Thus, pte, tī“ " 


(ii) Relative Frequency Theory of Probability 
The classical definition of probability given above suffers from 
certain — limitations, First, the definition cannot be applied when- 
ever it is not possible to make а simple enumeration o es. Which 
can be considered equally likely. For example, how does ifsapply to 
probability of rain? What are the possible cases ? We might tink that 
there are two contingencies ‘rain’ and ‘no rain’. But at any given ty it 
will not usually be agreed that they are equally likely. Similarly, what © 
is the probability of a student X passing an examination? Since there 
dard icm ibilities pes or not passing, we may say that the 
probability of X passing the examination is 4. But by doing so we may 
ignore facts. The student X may be a first class student in which case the i 
probability of passing may be nearly 1. The definition thus fails to 
provide a satisfactory answer. 
The classical tier also fails to answer ec m like "whatisthe — - 
robability that a male will die before the age of 602", “what is the | 
bility that a light bulb will burn Jess than 200 hours ?" etc, All these 
are legitimate questions which we want to bring into the realm of i 
É probability theory. т | 
In fact the classical definition is difficult or impossible to apply аз | 
soon 'а$ we deviate from the fields of coins, dice, and other supe 
games of chance. Secondly, the classical approach may not explain al 
results in certain cases. For example, if a coin is tossed 10:times we may 
© get 6 heads and 4 tails. The probability of a head is thus 0°6 and that of 
а tail 0'4. However if the experiment is carried ош а large number of 
~ times we should expect approximately equal number of heads and tails, 
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As n increases, i.e., approaches оо (infinity), we find that the probability 
of getting a head or tail approaches 0'5. The probability of an event can 
thus be defined as the relative frequency with which it occurs in an indefi- 
nitely large number of trials. If an event occurs a times out of n, its rela- 


tive frequency is v the value which is approached by “when n becomes 
infinite is called the limit of the relative frequency. Symbolically, 


P(A)=limit © 
nc n 
In reality we can never obtain the probability of an event as given 
by the above limit. In practice, we can only try to have a close estimate of 
P(A) based on a large n. For practical convenience we shall treat the 
estimate of P(A) as if it were actually P(A) and write the workng relative 
frequency definition of probability as 


P(4)=" 


To define P(A) as а limit as п approaches infinity, however, does 
emphasize that probability involves a long-run concept. This means that 
when we toss a balanced die six times, it is almost impossible for each of 
the six members to appear exactly once. If, however, we toss the die 
over and over again, fora large number of times, we can expect in the 
long run or on the average each of the six faces of the die to appear about 
1/6 of the time. It is exactly in this sense that we say that the probability 
of getting any one of the number on a die in a random toss is 1/6. 


The two approaches classical and empirical, though seemingly same, 
differ widely. In the former, P (A) and = were practically equal when n 
was large whereas in the latter we say that Р(А) is the limit 
£ as n tends to infinity. Inthe second approach, thus the probability 


itself is the limit of the relative frequency as the number of observations 
increases indefinitely. 


The statistical definition, though useful in practice, has difficulties 
from a mathematical point of view, sincean actual limiting number may 
not really exist. For this reason, modern probability theory has been 
developed axiomatically in which probability is an undefined concept 
much the same as point and line are undefined in geometry.* In this 
text, however, we shall confine ourselves only to the first approach. 


It may be pointed out at the very outset that probability should 
not be understood in the sense of certainty. For example, when we say 
that the probability of getting head or tail when an unbiased coin is tossed 
is $ it does not mean that if the coin is tossed 16 times we must get 8 
heads and 8 tails—what it means is that when an unbiased coin is tossed a 
large number of times and as m increases we will usually get close to 50 
per cent heads and 50 per cent tails. 


*Spiegel : Theory and Problems of Statistics, p.100* 
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The probability obtained by following relative frequency definition 
is called a posteriori probability as distinguished from a priori probability 
obtained by following the classical approach. A priori probabilities are 
specified by reason and no experiment is required to find out the probable 
results such as throwing a coin, throwing a die, drawing a card from a 
pack, etc. However, in statistical work the probability derived from pure 
reason is seldom found. Most probabilities are so difficult and there are 
ѕо many ifs and buts that it is impossible to find out probability а priori. 
In most fields of research it is a posteriori or empirical probability that is 
employed. 

(iii) Personalistic View of Probability 

While the relative frequency theory is still the most popular 
definition of probability the personalistic approach is steadily gaining 
strength. The personalistic or subjective theorists regard probability 
as a measure of personal confidence in a particular proposition, such 
asa belief that Mr. X would top the list in M. Com. examination in 
Delhi University this year. А subjectivist would assign a weight 
between zero and one to an event, according to his degree of , 
belief for its possible occurrence. Ў 

The subjective point of view grants that different reasonable indivi- 
duals may differ in their degrees of confidence, even when offered the 
same evidence. Consequently, personal probabilities for the same event 
may differ in the eyes of different decision-makers. 

It may be pointed out that out of the three interpretations of the 
concept of probability, each has its own merits and one may use whichever 
approach is convenient and appropriate for the problem under consider- 
ation. 


Importance of the Concept of Probability 
Since its humble beginning at the gambling tables in seventeenth 
century, probability theory has been developed and employed to treat and 
solve many weighty problems. It is the foundation of the classical deci- 
Sion procedures of estimation and testing. Probability models can be 
very useful for making predictions. It is concerned with the construc- 
tion of econometric models, with managerial decisions on planning and 
control, with the occurrence of accidents of all kinds with and random 
disturbances in an electrical mechanism. It is involved in the observa- 
tion of the life span of a radioactive atom, of the phenotypes of the off- 
spring, the crossing of two species of plants, the discussion about sex of an 
unborn baby, etc. In fact, it has become an indispensable tool for all 
types of formal studies that involve uncertainty. It should be noted that 
the concept of probability is employed not only for various types of scien- 
tific investigations, but also for many problems in everyday life. The role 
played by probability in modern science is that of a substitute for 
certainty. ! 
CALCULATION OF PROBABILITY 

For understanding the technique of calculating probability the follow- 

ing terms must be clearly understood : 
` 1. Mutually Exclusive Events 


Two events are said to be mutually exclusive or disjoint when both 
cannot happen simultaneously in а single trial or, in other words, the 
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happening of one precludes the happening of another and vice versa. For 
example, if a single coin is tossed either head can be up or tail can be up, 
both cannot be up at the same time. Similarly, a person may be either alive 
or dead at a point of time—he cannot be both alive as well as dead at the 
same time. To take another instance when a die is tossed any of the six 
faces may be up—the different cases are thus mutually exclusive because 
no two faces can be uppermost at the same time. Symbolically, if А and 
B are mutually exclusive events, P(AB)=0. 

The following diagram will clearly illustrate the meaning of mutually 
exclusive events : 


2. Independent Events 

Two or more events are said to be independent when the outcome 
of one does not affect, and is not affected by the other. For example, if a 
coin is tossed twice, the result of the second throw would in no way be 
affected by the result of the first throw. Similarly, the results obtained 
by throwing a die are independent of the results obtained by drawing an 
ace from a pack of cards. To consider two events that are not indepen- 
dent, let А stand for a firm's spending a large amount of money on adver- 
tisement and В for its showing an increase in sales. Of course, advertising 
does not guarantee higher sales, but the probability that the firm will 
show an increase in sales will be higher if 4 has taken place. 
3. Dependent Events 


Dependent events are those in which the occurrence or non-occurrence 
of one event in any one trial affects the probability of other events in 
other trials. For example, ifa card is drawn’ from a pack of playing 
cards and is not replaced, this will alter the probability that the second 
card drawn is, say, an ace. Similarly, the probability of drawing a queen 
from a pack of 52 cards is уз or jy. But if the card drawn (queen) is not 
replaced in the pack the probability of drawing again a queen is уг 
(- the pack now contains only 51 cards out of which there are 3 
queens). 

4. Equally Likely Events 


Events are said to be equally likely when one does not occur more 
often than the others. For example, if an unbiased coin or die is thrown, 
each face may be expected to be observed approximately the same num- 
ber of times in the Jong run. Similarly, the cards of a pack of playing 
cards are so closely alike that we expect each card to appear equally often 
when a large number of draws are made with replacement. However, if 
the coin or the die is biased we should not expect each face to appear 
exactly the same number of times. 
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5. Simple and Compound Events 


In case of simple events we consider the probability of the happen- 
ing or not happening of single events. For example, we might be interes- 
ted in finding out the probability of drawing a red ball from a bag con- 
taining 10 white and 6 red balls. On the other hand, in case of com- 
pound events we consider the joint occurrence of two or more events. For 
example, if a bag contains 10 white and 6 red balls and if two successive 
draws of 3 balls are made, we shall be finding out the probability of 
getting 3 white balls in the first draw and 3 black balls in the second 
draw—we are thus dealing with a compound event. 


THEOREMS OF PROBABILITY 
There are two important theorems of probability, namely : 
l. The Addition Theorem ; and 
2. The Multiplication Theorem. 


Addition Theorem 


The addition theorem states that if two events А and В are mutually 
exclusive the probability of the occurrence of either 4 or B is the sum of 
the individual probability of 4 and В. Symbolically, 

*P(A or B)=P(A)+P(B). 


The addition theorem is also known as the theorem of total 
probability. 

Proof of the Theorem. If an event A can happen in a, ways and B 
in a, ways, then the number of ways in which either event can happen is 
ata. Ifthe total number of possibilities is л, then by definition the 
probability of either the first or the second event happening is 


mtas _ d; а, 
n ^m oH +2 
But 4 род) 
u mo ( 
rae nes 
and n P) 


Hence P(A or B)=P(A)+P(B). 
The theorem can be extended to three or more mutually exclusive 
events. Thus 2 
P(A or B or C)=P(A)+P(B)+P(C). 


Illustration 2, Опе card is drawn from a standard pack of 52. What is the 
chance that it is either a king or a queen ? (M. Com., Delhi, 1968) 


Solution, There are 4 kings and 4 queens in a pack of 52 cards. 


. The probability that the card drawn is a king 
4 
ieee 


* P(A or B)=P(A)+P(B)=P(AUB) 
where AUB reads ‘A union B”. 
Knowledg3 of permutation and ccmbination is extremely useful in calculating 


* probabilities. Fo1 the sake of convenience in understandidg the conccpt of permutation 


and combination, an appendix is given at the end of the text. 
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and the probability that the cardjdrawn is a queen 


misa 


3 Since the events are mutually exclusive the probability that the card drawn is 
either a king or a queen 
4 + 4 8 2 


$2.2 525 52 5 135 

When events are not mutually exclusive. The addition theorem dis- 
cussed above is not applicable where the events are not mutually exclusive. 
For example, if the probability of a man’s buying a ready-made shirt is 
065 and that of buying a teady-made pant is 0°25 we cannot calculate the 
probability of his buying either a shirt or a pant by adding the two 
probabilities because the events are not mutually exclusive—he could very 
well buy a shirt and pant. B 
№ When events are not mutually exclusive the addition theorem has 
to be modified The probability that at least one of the two events А and 
B which are not mutually exclusive will occur is : 

*P(A or B)=P(A)+P(B)—P (4 and B). 

In this formula, we thus subtract P(A and В), namely, the proportion 
ofcases that are counted twice in P(A)+P(B). The theorem is thus 
reconstructed in such a way asto render А and B mutually exclusive 
again because we are subtracting the cases in which А and B occur 
simultaneously. 


The following diagram will clarify this point : 


In case of three events, ; 
Р( А ог B or C) — P(A) - P(B)+-P(C)—P(AB) 
—P(AC)—P(BC)+P(ABC) 


Illustration 3. A person is known to hit the target in 3 out of 4 shots, whereas 
another person is known to hit the target in 2 out of 3 shots. Find the probability of 


the targets being hit at all when they both try. (М.А, Econ., Delhi, 1969) 
! Solution, The probability that the first person hits the target. 
” » » second ,, » » -2 


3 
The events are not mutually exzlusive because both of them may hit the target. 
+. The required Probability 


"(HG / 
*P(A or B) =P(A)+P (B) —P(AB) 


or P(AUB) =P(A)+P(B) —P(ANB) 
А and B denoted by ANB (read “А intersection B") 
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ГА 


А: бе 
31315 125/4127 
Here we h ave applied the rule 
P(A or B)=P(A)+P(B)—P(A and В), 


Tilustration 4. А bag contains 30 balls numbered from 1 to 30. Oae ball is 
drawa at random. Find the probability that the number of the drawa ball will bea 
multiple of (2) 5 or 9, and (5) 5 or 6. 


Solution. (2) The probability of the number being multiple of 
5(5, 10, 15, 20, 25, 30) —- S. 


30 
The probability of the number being multiple of 
3 
9(9, 18,27) — 30: 


Since the events are mutually exclusive the probability: of the number being 
a multiple of 5 or 9 will be 


4 3. 
10 * 
(6) The probability of the number being multiple of 
8 “а ҮЛ 
20, 25 SAU 
5 (5, 10, 15, 20, 25, 30) is 30^ 
The probability of the number being multiple of 
6(6, 12, 18, 24, 30) is. 
The probability of getting a number either a multiple of 5 or 6 is 
Ee eet cya 
=з 130 7 30" 
Bat this is wrong since 30 is a multiple of 5 as well as of 6. The drawing of the 
ball numbered 3) entails the occurrence of both the events and "hence. the probability 
G haisu te TOT ey 


caie Taaie T ТУ, ай ТИЗ 


6 3 
30 * 307 


of getting a num ber which is a multiple of 5 or 6 i; 


Multiplication Theorem 

This theorem states that if two events A апа В are independent the 
probability that they will both occur is equal to the product of their _ 
individual probabilities. Symbolically, if 4 and В are independent, then 

P(A and B)=P(A) x P(B). 

The theorem can be extended to three or more independent events. 

Thus 
P(A, B and C)=P(A) X P(B) x P(C). 

Proof of the Theorem. If ап event can happen in л; ways of which 
а; are successful and the event В can happen in п, ways of which а, are 
successful, we сап combine each successful event in the first with each 
successful event in the second case. Thus the total number of successful 
probabilities in both cases is ах а,. Similarly the total number of 
possible cases is 7, X. 

Then by definition the probability of both jndegendent events 
happening is 


But ip(a) 


` 
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and = P(B), 
2 
et Р(А and B)—P(A) x P(B). 


Ina similar way the theorem can be extended to three or more 
events. 


Illustration 5, A Problem in statistica! methods is given to four students A, B, 


Cand D. Their chances of solving it are], d i and + - What is the pro- 
bability that the problem will be solved ? 
Solution : 
Probability that А fails to Solve the problem is 1—-] = 9 
1 2 
BPE ha SB pi o ans e 
1 а 
Е IG desi inge нне dde 
1 3 
E Me Diy I Ten „=ч ез 


Since the events аге independent the probability that all the four students fai} 
to solve the problem is 


а A MA | 
QUU SUA qi e 
*. The probability that the problem will be solved 


LeS 
PREPS T RERO 


Illustration 6, A man wants to marry a girl having qualities : white complexion 
—the probability of getting such a girl is one in twenty ; handsome dowry—the pro- 
bability of getting this is one in fifty ; westernised manners and etiquettes—the pro- 
bability here is one in hundred. Find out the probability of his getting married to 
Such a girl when the possession of these three attributes is independent. Я 


Solution, Probability of a girl with white complexion 
à = 37—005 
Probability оға girl with handsome dowry 


1 j 
=й =0`02 
Probability of a girl with westernised manners 


Since the events are independent, the Probability of simultaneous occurrence of 
all these qualities 


have occurred (or vice versa). The Probability attached to such an 
event is called the conditional probability and is denoted by P(4/B) or, in 
other words, Probability of A given that B has occurred. 


a 
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If two events 4 and В are dependent, then the conditional probability 
of B given A is 
. F(AB) 

PIA) рса)" 

Proof. Suppose a is the number of cases for the simultaneous 
happening of A and B out of a;4-a, casesfin which А can happen with or 
without happening of В. 

а a,/n P(AB) 
BIA 1 - —5 
PRIA a Gata) PA) 
Similarly it can be shown that 
. F(AB) 
P(A/B)=~ рву. ; 

The general rule of multiplication in its modified form in terms of 

conditional probabilities becomes : 7 
P(A апа В)=Р(В)хР(А[В) 

ог P(A and В)=Р(А)хР(В|А) 

For three events A, B and C, we have 

P(ABC)=P(A) x P(B|A) x P(C/AB) 

i.e., the probability of occurrence of А, B and C is equal to the probability 
of A times the probability of B given that A has occurred, B times the 
probability of C given that both 4 and B have occurred. 


Illustration 7, A bag contains 5 white and 3 black balls, Two balls are drawn: 
at random one after the other without replacement. Find the probability that both 
balls drawn are black. 

Solution, Probability of drawing a black ball in the first attempt is 

3 3 
тезе: 
Probability of drawing the second black ball given that the first ball drawn is 


black 
vy 
PUT CL. 
*. The probability that both balls drawn are black is given by 
P(AB)=P(A) x P(B|A) 
И КЫ сы 
8 d AB 
Illustration 8. Find the probability of drawing a queen, a king and a knaye 
in that order from a pack of cards in three consecutive draws, the cards drawn not 
being replaced. 


4 
Solution, The probability of drawing a queen=-zz . 
4. 
The probability of drawing a king after a queen has been drawn= si 
The probability ‘of drawing a knave given that a. queen and king have been 
4 
550° 
Since they are dependent events, the required probability of the compound event 


drawn — 


is: 


4 4 4 64 Н 
eri х 3p = 157,600 l 0005: 
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Mathematical Expectation 


If p is the probability of the occurrences of an event in a single 
trial, then the expected number of occurrences of expectation of that event 
in n trials is defined as np. 


Illustration 9. In 1,200 trials of a draw of two dice, what is the expected 
number of times that the sum will be less than 5 ? 


Solution. Total number of equally likely cases=6x6=36 
Number of cases favourable to the event in a single throw of two dice 6. 


1 
те, (+1142, 143, 2H, 242, 3+1)Р=-б „ T, 
The expected number of times the total will be less than 5 in 1,200 trials 
= X 1,200200. 


Thus the expectation may be regarded as the likely number of suc- 
cesses to occur in z trials. In the above case we have determined that 
in 1,200 trials ofa throw of two dice, there will be 200 throws with sum 
less than 5. However, it does not mean that this event must happen 
when dice are thrown 1,200 times; in fact, it does not even mean that 
this event is very likely to happen. What it Means is that this event 


events. If probability p is determined as the relative frequency in л trials, 
then the expectation in these т trials is equal to the actual number of 


The concept of expectation is of great use in the analysis of all 
games of chance requiring the evaluation of the player's expectation. 
М If p is the probability of success and m the amount which a person 
15 to receive in the event of Success, his expectation would be pm. We 


Illustration 10, А and B play for a Prize of Rs. 1,000. 4 is to throw a die 
first and is to win if he throws 6. If he fails B is to throw and is to win if he throws 6 or 


Se TE he fails 4 is to throw again and to win if he throws 6, 5 or 4 and so on. Find their 
Iespective expectations, 


Solution. Probability of A's winning in the Ist throw 
1 
sabe 
Probability of B's winning in the 2nd throw 
LU o UN. 
OmU 18 
Probability of A's winning in the 3rd throw 
$ 


Probability of B's winning in the 4th throw 
SUME E gee ЭУ ШЫЛ 


а X XT E 


NU 6.6 27 
Probability of .4’s winning in the Sth throw 
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Probability of B's winning in the 6th throw 
эу Зах | 
бб бега 
Total of A's chances of success 
1 5 25 _ 169 
7€ Тв 34 324 
Total of B's chances of success 
15 oe een BS) 
—is "27 324 — 304 
Their respective expectations are 
_ 169 = 3 
А= 3947 1.000—Rs. 5216 
155 


Bo 524 


X1,000—Rs. 4784, 


discrete random variable which can assume the values X, Жу. ы: X, 

with respective probabilities p,, ps...... - рь Where р; - pat ......... P=} 

the mathematical expectation of X denoted by E(X) is defined as 
E(X)=P X1 + piX,......... рьХу. 


Illustration 11, In a given business venture a man can make а profit of Rs. 1,000, 
with probability 0'8 or take а loss of Rs. 400 with probability 0'2. Determine his. 
expectation. 


Solution. Е(Х)=Р\Х\-+Р„Х,+......РьКъ 
=0`8х 1,0004-02 x (—400)*=Rs. 720. 

Bayes's Theorem 

One of the most interesting (and controversial) applications of the 
results of probability theory involves estimating unknown probabilities and 
making decisions on the basis of new (sample) information. Statistical 
decision theory is a new field of study which has its foundation in just 
such problems. 
х Тһе Bayes's theorem named after Thomas Bayes (an English 
philosopher) and published in 1763 in a short paper has become one of 
the most famous memories in the history of science and one of the most 
controversial. His contribution consists primarily of a unique method for 
calculating conditional probabilities. The so-called **Bayesian" approach 
to this problem addresses itself to the question of determining the 
probability of some event, Ej, given that another event, A, has been (or 
will be) observed, i.e., determining the value of P(£;/A). The event А is 
usually thought of as sample information so that Bayes's rule is concerned 
with determining the probability of ап event given certain sample infor- 
mation. For example, a sample output of 2 defectives in 50 trials 
(event 4) might be used to estimate the probability that a machine is not 
working correctly (event E;) or you might use the results of your first 
examination in statistics (event 4) as sample evidence in estimating the 
probability of getting a first class (event £;). 

Bayes's theorem is based on the formula for conditional probability 
explained earlier. Let 

A; апа 4,— The set of events which are mutually exclusive (the two. 

events cannot occur together) and exhaustive (the 


«Іп case of loss, the variable is negative. 
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combination of the two events is the entire experiment), 

and 

B=a simple event which intersects each of the A events 
as shown in the diagram below : 


(Observe the diagram ! The part of B which is within A, represents 
the area “A, and В” and the part of B within A, represents the area “Ag 
and B".) 

Then the probability of event Aj, given event B, is* 

P(A, B). Pr ап 19) 


and, similarly the probability of event A», and B, is 
= P(A, and B) 
PA, B). Pan 


where P(B)— P(A; and B)-- P(A, and B), 

P(A; and B)—P(A)xPp (B/A,), and 

P(A, and В)=Р(А„) x P(B/A,) 
In general let А, As, Аъ... 4, be the Set of n mutually exclusive 
» exhaustive events. The above expressions may be summarized as 
OLLOWS : 


Ply B= "Ac ond В) 


where Р(В)=Р(А\ and В)-ЕР(А„ and B)-+...... +P(4, and B) 
and Р(А‹ and B)=P(A,) x P(B/A,), 


information is taken into account. Posterior probabilities are always 

conditional probabilities, the conditional event being the sample infor- 

mation. Thus, 4 priori probability which is unconditional probability 

becomes a posterior probability, which is à conditional probability, by 
Е 


The followin example shall ill FRE ; 
theorem : 8 ple shall illustrate the application of Bayes's 


Illustration 12, Assume that a facto h: i 
u TY hastwo machines. Past records show 
that machine 1 produ сез 307; of the items of output and machine 2 produced 70% of the 


* P(A and B)=P(B and 41) - P(B) x P(A,/B) 
Thus P(A,/B) = Ps and В). d В) 
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items. Further, 5% ofthe items produced by machine 1 were defective and only 1% 
produced by machine 2 were defective. If a defective item is drawn at random, what is 
the probability that the defective item was produced by machine 1 or machine 22 
Solution. Let 4;=the event of drawing an item produced by machine 1, 
4,=the event of drawing an item produced by machine 2, 
and B=the event of drawing a defective item produced either by 
machine 1 or machine 2. 
"Then from the first information, 
Р(А\у)=30%=0`30 
P(As)=70%=0°70 
From the additional information 
P(B/A;)=5%=0°05 
P(B/A,)=1%=001 
The required values are tabulated below: 
COMPUTATION OF POSTERIOR PROBABILITIES 


04 (2) (3) (4) (5) 
| Prior Conditional Joint probability Posterior 
Event | probability probability event (revised) 
| B given event А probability 
| P(A, and B) P(A,/B) 
P(A0 P(B/A;) (2) х (3) (4) =P(B) 
A | 0°30 005 0015 0°015/0:022 
! =0'682 
A, 070 001 07007 0007/0022 
d 70318 
Total | 1:00 P(B)=0'022 1:000 
| 


Without the additional information, ме тау be inclined to say that the defective item 
is drawn from machine 2 output since P(A,)=70% is larger than P(Ay)=30%. With the 
additional information, we may give a better answer. The probability that the defective 
item was produced by machine 1 is 0'682 or 68/275 and that by machine 2 is only 0:318 
or 31'8%. We may now say that the defective item is more likely drawn from the out- 
put produced by machine 1, 
The above answer тау be checked by actual number of items as follows : 
If 10,000 items were produced by the two machines in a given period, the number 
of items produced by machine 1 is 
10,000 х 30% =3,000, 
and the number of items produced by machine 2 is 
10,000 x 70% =7,000. 
The number of defective items produced by machine 1 is 
3,000 x 575—150. 
and the number of defective items produced by machine 2 is 


7,000 x 1%=70. 
The probability that a defective item was produced by machine 1 is 
Pe} Leer 
18070 0692 
and by machine 2 is 
Pe Д 
750470 707318. 


It is clear from the above illustration that Bayes's theorem provides 
a powerful method in improving the quality of probability for aiding the 


А-Г16 PROBABILITY 


management in decision-making under uncertainty. As we proceed with 
repeated experiments, evidence accumulates and modifies the initial prior 
probabilities and, thereby,modifies the intensity of adecision-maker's belief 
in various hypotheses. Repeated estimates will soon produce such low 
posterior probabilities for some hypotheses that they can be eliminated 
from further consideration. In other words, the more evidence we accu- 
mulate, the less important are the prior probabilities. The only res- 
triction on the application of Bayesian rule is that all hypotheses must be 
tenable in a given situation and that none is assigned а prior probability 
of 0 or 1. 


MISCELLANEOUS ILLUSTRATIONS 


Illustration 13. An urn contains 10 white and 6 black balls. Find the probability 
that a blind-folded person in one draw shall obtain a white Fall. 


Solution, Total number of balls in the urn=10+6=16 
Number of white balls =10 
~ Number of favourable cases 
"Tota! number or equally likely cases 


Illustration 14. A bag contains 6 white, 4 red and 10 black balls, Two balls 
аге drawn at random. Find the probability that they will both be black. 


Solution, Total number of balls in the bag 
=6+4+10=20 
Two balls can be drawn from 20 in "C, ways 
&- 70x19 . Я 
— x1 7:190 ways 
And 2 balls can be drawn from 10 black balls in WC, ог 
_10х9 . 9 
=x] 45 ways 
-. The probability that the two balls drawn at random are black 
45 3 
190 =0`237. 


Illustration 15. А bag contains 8 white and 4 red balls, Five balls are drawn at 
random. Whatis the probability that 2 of them are red and 3 white ? 


Solution, Total number of balls in the bag 
=8+4=12 
Number of balls drawn=5 


5 balls can be drawn from 12 in 1С, ways, 2 red balls can be drawn from 4 red 
in ‘C2 ways and 4 white balls can be drawn from 8 white in *C, ways. 


7. The number of favourable cases *C, x *C, and the required probability is 
2 1x6 
nci 
—4X3.8X7x6.. 5x4x3x2x1 
2x1 3x2x1 12x11x10x9x8 


p 


Tllustration 16. Froma pack of 52 cards two are drawn at random, Find the 
probability that one is a king and the other a queen. 


Solution, Two cards can be drawn from 52 іп °С, ways 
А king can be drawn in +С, ways, and 
a queen can be drawn in *C; ways, 
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Since each of the former can be associated with each of the latter, a king and a 
queen can be drawn іп “С, x*C ways. 


A No. of favourable cases 
P Total number of equally likely cases 
xtc 


"MC. 
L4X4x2. | 8 
mazasi 7 663 0 012. 


Hllustration 17. А card is drawn from a pack of playing cards and then another 
card is drawn without the first being replaced. What is the probability of drawing (i) 
two aces, and (ii) two spades ? 


Solution. (i) The probability that the first card is an aci . When an ace 
has been drawn there are three aces in the 51 cards left. Therefore, the probability 
that the second card should also be an ace is 3A Hence the probability. that both are 
Bces is 


qi vai aw 
52" 3p 2217 
(ii) The probability that the first card is a spade is BES The probability that 


52 


the second card should also be a spade is z Therefore, the probability that both 


cards are spades is 
3. 4.12,..1 
52 51 17° 
Шиѕіга(іоп 18. A bag contains 6 white and 4 black Falls and a second one 4 


white and 8 black balls. One of the bag is ckesen at random and a draw of 2 balls is 
made from it. Find the probability that cne is white and the other is black. 


Solution, There are 50% chances of choosing either bag. The protability that 
the first bag was selected and a draw of 2 balls gives one white and one b lack ball 
BU iy SCION id 
2 WC, 15- 
The probability that the second bag was selected and a draw of 2 balls gives one 
white and one black ball 


T MCI 8 

2 “С, 33 

Since the events are mutually exclusive the probability of the event 
4,8. 84 | 

is 733 165 -0309. 

Illustration 19, One bag contains 4 white balls and 2 black balls, Another 
contains 3 white balls and 5 black balls. If one ball is drawn from each bag, find the 
probability that (a) both are white, (b) both are black, and (c) one is white and one is 
black. (М.А, Econ., Delhi, 1968) 


Solution, (a) Probability of drawing a white ball from the first bag=—4 


Probability of drawing a white ball from the second bag= i 
Since the events are independent the probability that both the balls are white 


453 
e 8 4^ 


SM-A—11°77-48 
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(5) Probability of drawing a black ball from the first bag 2 


Probability of drawing a black ball from the second bag=~> 


8 
б 2.575 
Probability that both are black- c x з 2% 
(с) The event “оле is white and one is black” is the same аз event “either the 
irst is white and the second is black or the first is black and the second is white. 


`7. The probability that one is white and one is black 


ENTEND 


E AE 
48 "gU 


Illustration 20. A six-faced die is so biased that it is twice as likely to show an 
ven number as an odd number when thrown. It is thrown twice. What is the pro- 
ability that the sum of the two numbers thrown is even ? 


Solution, Let p be the probability of getting an even number in a single throw 
f die and q be that of an odd number. Then we have as per the given question 


ое 
р 3°9= 3' 
There are two mutually exclusive cases in which the event can occur : 


(a) an odd number in the first throw and again an odd number in the second 
row ; and 


(5) an even number in the first throw and again an even number in the second 
row, 


Since the throws are independent, the Probabilities of these cases are 
EX and 2х2 respectively, 
Since the cases are mutually exclusive, the required probability 
APIS DONO: 
SECUS. 
Illustration 21. Ina siagle throw of two dice what is the probability of getting 
(a) a total of 8 ; and 
(b) total different from 8? 


Solution, (a) A total of 8 can be obtai ned as follows : 


е second four is of the first dice. So we have to consider it only once. Thus there 


The number of equally likely ways in which two dice can fall is 6x 6—36. 
. Required probability of getting а total of 8ina single throw of two dice is 
Hor 
fag: 
... (5) Since probability of the faces of the two dice totalling 8 or a number 
erent from 8 is oas, the probability of getting a total different from 8 is 


ese eel 
36 36* 


Illustration 22, A bagcontains 5 white and 8 red balls, Two drawings of3 
ls аге made such that (a) the balls are replaced before the second trial, and (b) the 
з are not replaced before the second trial. Find the probability that ‘the first draw- 
will give 3 white and the second 3 red balls in each case, 
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Solution, (a) When balls are replaced : 

Total number of balls in the bag=5+8=13, 

3 balls can be drawn from 13 in TC. ways, 

3 white balls can be drawn from 5 in 5C, ways. 

3 red balls can be drawn from 8 in °С, ways. 

*. The probability of drawing 3 white at the first trial is 
°С. 5 
с. 143 

and the probability of 3 red at the second trial is 


(MC, 23. 
The probability of the compound event is 
5 28 
143 * TB 
M0 
730449. 707007. 
(b) When balls are not replaced : 
At the first trial 3 white balls can be drawn in °С; ways. 
-. The probability of drawing three white balls at the first trial is 
5с, 5 


Bor qua 


When the white balls have been drawn and not replaced the bag contains 2 
White and 8 red balls. Therefore, at the second trial 3 red balls can be drawn from 10 
in °C, ways and 3 red balls can be drawn from 8 in *C, ways, 


*. The probability of red at the second trial 


*C, 7 
EUN ТЕ ЕТУ 
-. The probability of the compound event 
IE 37 
143 ^ 15 
7 $ 
= =0°016. 


Illustration 23, An urn А contains 2 white and 4 black balls, Another urn:B. 
contains 5 white and 7 black balls. А ball is transferred from the urn A to the urn B. 
Then a ball is drawn from urn В. Find the probability that it will be white. 

(М.А. Econ., Delhi, 1967) 

Solution, The ball transferred can either be black or white. 

(i) When white ball is transferred : 

Probability of drawing white ball from urn 4 

2 1 
CT DEAS Ve 

Now urn В has 6 white and 7 black balls, So probability of drawing a white 

ball from it is ү? 


гб е а. 
647 1З* 


‚ Probability of the compound event, transferring a white ball and then 
drawing a white ball is 


l5 дү 
Xp p wg 


(ii) When black ball is transferred : 

Probability of drawing black ball from штп 4 
4 2 

2:147 3* 
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Now urn B has 5 white and 8 black balls. 
The probability of drawing white ball from it 


eU ED 
5+8 13 


4. Chance of the compound event, transferring a black ball and then drawing 
& white ball from it is 


Xo 3 
3 2322397 


The two events are mutually exclusive. Hence the probability of transferring a 
ball from urn 4 to B and then drawing a white from second 


2 10 16 À 
“тз 39-39 704. 
Illustration 24, А bag contains 2 white and 3 black balls. Four persons A, B, 


C and D in the order named each draw one ball and do not replace it. The person to 
draw a white ball receives Rs. 200. Determine their expectations. 


Solution. Since only 3 black balls are contained inthe bag, one person must 
win in the first attempr. 


Probability that A wins =2 


aM A's expectation 2. X 200 — Rs. 80. 


Probability that А loses and B wins 


itn MEME 
Pig з 
E. B's expectation X200—Rs. 60. 


Probability that 4 and B lose but C wins, 
X000 
AWE 4 Be Aa 5: 
ts C's expectation X 200—Rs. 40. 
Probability that 4, B and C lose and D wins 
=() (3) (5) (+)- ЕЗ 
NS 4 3 1 10 
Rte D's expectation ==) x200=Rs 20. 


EOS the expectations of 4, B, C and D respectively are Rs. 80, Rs. 60, Rs. 40 


. Illustration 25. Prove that the sum of the probabilities of all possibilities in 
two independent events amounts to certainty. 


Solution. Let the probability of success and failure in the first event Бе р; and 
@ and the second event p; and ge. Then, 


the chance of success in the first event and success in the second event is 


DiXps 
the chance of success in'the first and failure in the second event is 
, PIXGs 
the chance of failure in the first event and success in the second event is 
a Xp. 
the chance of failure in first event and failure in the second event is 
9х9 


These аге all the possibilities, and the sum of these possibilities is 
(р. Хра) +(р1 Ха) + (41 Xs) H- (дух) 


— 
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—pun 4) +91(ра+4) 
But P2+q2=1, since pi+gi=1 
Р=р\(1)--@1(1) 
—prtq-l. 


Illustration 26. А box contains 3 red and 7 white balls. One ballis drawn at 
random and in its place а ball of the other colour is put in the box. Now one ball is 


drawn at random from the box. Find the probability that it is red. 
(B. Com., Bombay, 1974) 


Solution. At the second draw red ball can be drawn in two ways if at the first 
draw ball drawn is red and at the first draw the ball drawn is white. 


When the first ball drawn is red: 
The probability of drawing a red ball = 


Now in the box a white ball is put in place of the red ball drawn so the box 
contains 2 red and 8 white balls. к 


So, the probability of drawing a red ball again— 35 


When the first ball drawn is white : 


The probability of drawing а white ball. 


Now in the box а гей ballis put in place of the white ball drawn. There are 
thus 4 red and 6 white balls in the box. 


The probability of drawing a red ball ==; 
Hence the probability of drawing a red ball 
er ERE Th ife ) 
"(w* ww) +Go* 10 
AO a ads 
100 * 100 = 100 


Illustration 27, (a) Five men ina company of 20 are graduates. If 3 men are 
picked out of 20 at random (i) what is the probability that all are graduates ? 


=0'34, 


(b) What is the probability of at least one graduate ? (I. C. W. A., 1969) 
Solution: (i) The total number of ways of pickigg 3 men out of 20 men 
=" 
The number of ways of picking 3 men (all being graduates) out of 5 men 
=*С, 
Hence the probability of picking 3 men all being graduates, out of 20 men 
°С, 5х4х3 1 


= 200, = 20x19x18 — 114 


15 
P (that there is no graduate) e c 
3 


15х14х13х6 91 
20х19х18х6 228 
P(that there is at least one graduate) 
=1—Р (that there is no graduate) 
ш) Күш ү, 
228 ~ 228° 
Illustration 28, А number л is chosen at random from the integers 1, 2, 3, ...... 


m n ; А and B denote the events that are a multiple of 2and 3 respectively. Show 
that A and B are independent events when n=96 but not when n=100. 


=1 
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Solution. The events 4 and B consist of the following integers : 
Е o8 10, Br $6 9, 


Also the event 4B means that the integer isa multiple of 2and 3. Therefore 


the event АВ consists of the following integers : 


4 and B 


events 4 


4B: 6, 12, 18, 


24, 30, 36, 
42, 48, 54, 
60, 66, 72, 
78, 84, 90, 
96, 
When n=96 ; 
Probability of the event А or P(A) = gat 


Probability of the event B or P(B) - 2-1 


Probability of the event АВ or P(AB) = + = E Я 


Since P(A) x P(B, =P(AB) by multiplication theorem it follows that the events 
are independent. 


When n =100: 
Probability of the event А=Р(А) = = 


> 


Probability of the event B=P(B) =p 


Probability of the event AB— P(AB) EE 


Since P(A).P(B) # P(AB) hence by multiplication theorem it follows that ihe 
are not independent, 


Illustration 29. (а) Given that four airlines provide Service between Delhi and 


Bombay, in how many dox Ways can a person select airlines for a trip from Delhi 
i) hem i 


(b) 6 coins are tossed simultaneously. What is the probability that they will 


fall with 4 heads and 2 tails up ? 


(d) In how many Ways can the word ‘PROBABILITY’ be arranged ? 
(e) What is the Probability of getting a number Breater than two with an 


ordinary die ? 


(f) A pot contains 4 white and 6 red balls. Two drawings of 3 balls are made. 


Find the probability that the first drawing will give all the three white balls and the 
Second all the three red balls, if the balls are replaced before the trial. 


(M. Com. Raj., 1969) 
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Solution. (а) (i) Since the person travels from Delhi to Ecmtay and tack by 
the same airline, therefore he can select any one airline out of the four airlines, Hence 
the number of ways of selecting an airline is 4. 

(ii) Since the person travels frem Delhi to Pern tay ty cre airline and returns 
by another so there are four ways of selecting one airline ficm Delhi to Ecmtay. Now 
there remain three airlines by which he has to return. The number of ways of selecting 
an airline for return is 3. f 

Hence the number of ways of selecting airlines fora trip from Delhi to Bombay 
such that both ways he travels in different airlines. 


=4х3=12 ways. 
(Б) The probability of a head =} 
The probability of a tail =} 
"Ош of 6 coins any 4 coins will show head and the remaining 2 will show tail. 


Hence the required probability—*C,(3)*(3)*— E 


(c) Now first prize can be won in 5 ways. Then remain 4 prizes. Hence the 
second prize can be won in 4 ways, Thus, there are 5x4—20 ways of winning two 
prizes. i 

Since there are 25 punches on the punch board, there are 25 ways of selecting 
first punch, Also there are 24 ways of selecting the second punch. 

Hence there аге 25x 24— 600 ways of selecting two punches. 

Hence the probability of winning two prizes 

20.3 
600 30° 

(d) There are 11 letters in which ‘B’ comes twice and also "7" comes twice. If all 
the words were different t hen there were 11 ! ways of arranging them. Since 2 'B's and 
2 “Г are coming they can be arranged іп 2! x2! ways. 

Hence the total number of ways of arranging the word ‘probability’. 

1! 
51х21 =997200. 


(e) In an ordinary die 3, 4, 5 and 6 means the number greater than 2. 
Hence the probability of getting a number greater than 2= 2. 


(f) The number of ways of selecting 3 white balls out of 4 white balls is *C;. 
Also the number of ways of selecting 3 talls out of 10 balls is 1*C;. 
The probability of getting 3 white balls at the first draw 


CE 
T, 730^ 


The number of ways of selecting 3 red balls out of 6 red balls is *Cs. 
The probability of getting 3 red ba!!s at the second draw 
С, _ 3 
QUE 18" 
Hence the probability of drawing 3 white balls at first draw and 3 red balls 
at the second draw 
NE. 
30 18 180 
Hilustration 30. А market research firm is interested in surveying certain atti- 
tudes in a small community There are 125 hovsebolds broken down according to 
income, ownership of a telephone and ownership of a TV. 


Households with annual income Households with annual income 
of Rs. 8,000 or less above Rs. 8,000 
Telephone No Telephone No 
Subscriber Telephone Subscriber Telephone 
Own TV set У 27 20 18 10 


No ТУ set 18 10 12 10 
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(1) What is the probability of obtaining a TV owner in drawing at random ? 
(ii) If a household has income over Rs. 8,000 and is а telephone subscriber, 
what is the probability that he has a TV ? 
(iii) What is the conditional probability of drawing a household that 
owns a TV, given that the household is a telephone subscriber. 
(iv) Are the events ‘ownership of a TV’ and ‘telephone subscriber’ statisti- 
cally independent ? Comment. (M.B.A., Delhi, 1971) 
Solution : 
(i) Probability of drawing a TV owner at random, i.e., 
____ Number of favourable cases 
P—Total number of equally likely cases 


=06 


“TS 


(ii) There are 30 parsons (184-12) whose household income is above Rs. 8,000 
and are also telephone subscribers. Out of these 18 own TV set, Hence the pro- 
bability of this group of persons having a TV set 


(iii) Out of 75 persons in all who own a TV, 45 persons are such who have a 
ТУ and are telephone subscribers also. Hence the conditional probability of drawing 
a houschold that owns a TV given that the household is a telephone subscriber 
ETES =06 
TS: 
(iv) Two events A and B are statistically independent if 
P(AB)=P(A) x P(B) 
Let A denote those who have TV and В those who are telephone subscribers. 
Probability of a person owning a TV 
Le., Р(А)= ЫН, U. 75 persons out of a total of 125 own a 
125 TY] 


Probability of a person being a telephone subscriber 


= [.7 75 persons out of a total of 125 are 


telephone subscribers} 
45 9 
PUAB) =e = 3 
WE Van 
P(A)XP(B)= 55 X155 = 55 
Hence P(AB)=P(A) x P(B) 


We, therefore, conclude that the events ‘ownership of a TV’ and ‘telephone 
subscriber’ ate statistically independent. 
Illustration 31. (a) ‘A’ speaks truth in 60 per cent cases and’ B’ in 70 per cent 
pases, S percentage of cases are they likely to contradict each Other in stating 
е same : 


(b) Three groups of workers contain 3 men and one woman, 2 men and 2 
women, and 1 man and 3 women respectively, One worker is selected at random from 
each group, What is the probability that the group selected consists of 1 man and 2 
women ? (M. Gom., Delhi, 1971) 

Solution, Thay will contradict each other only if опг of them speaks. the truth 
and the other speaks a lie, 

The probability that 4 speaks the truth and Ba lie 
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The probability that B speaks the truth and 4 a lie 
70 . 40 14 


The total probability 
ph 9x Tfu123: 
—s9 * 39 50 
The percentage of cases in which they contradict each other 


2 A E 
Sp 100=46%. 


(b) There are three possibilities in this case : 
(i) Man is selected from the first group and women are selected from 2nd 
and 3rd groups ; or 
(ii) Man is selected from the 2nd group and women from the Ist and 3rd 
groups ; or 
(iii) Man is selected from the 3rd group and women from Ist and 2nd 
groups. 
The probability of selecting a group of 1 man and 2 women 


3 2 3 2 1 3 1 1 2 
«(S zen x4)+(4x4%3)+(Fx9%q) 
С Еро анат е 
; Буза 32. 192,132. 
Wa Illustration 32. What is the chance that a Leap year, selected at random, will 
contain 53 Sundays. 

Solution, A Leap year consists o` 366 days and, therefore, contains 52 complete 
weeks and 2 days extra. These 2 days may make the following 7 combinations : 
Monday and Tuesday 
Tuesday and Wednesday 
Wednesday and Thursday 
Thursday and Friday 
Friday and Saturday 
Saturday and Sunday 
. Sunday and Monday 
| Of these seven likely cases only the last two are favourable. Hence the required chance 

=2/7. 


Illustration 33. A University has to select an examiner from a list of 50 persons 
20 of them women and 30 men, 10 of them knowing Hindi and 40 not, 15 of them being 


оборо 


x 


teachers and the remaining 35 not. What is the probability of the University selecting 
а Hindi-knowing woman teacher ? (M. Com., Allahabad, 1974) 
Solution, 
] Probability of selecting а woman—-25- 
| $ 15 
» » » зу feacher—--55- 
| oo pt ox, Е . 10 
+ T Pt aD ed es = ,; Hindi-knowing candidate GP 


2 Since the events are independent the probability of the University selecting a 
Hindi-kno wing woman teacher Е 
2120 0015010: 03 
58.5459. 5011151257“ 
Hlustration 34, А bag contains 10 white and 6 black balls. 4 balls аге suc- 
cessively drawn out and not replaced. What is the probability that they are alternately 
of different colour ? 
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Solution, (a) Beginning with white 


The probability of drawing a white ball = 
6 
Cine ay black ball {һеп=-тс- 
5 9 
o » » o» » White ball then=—7- 
5 
» Жеў. » » — » black ball then 


Therefore the probability of the compound event 
0" C16 NU ша as 
16 15 44 * 33 ~ 738 
(b) Similarly beginning with black 


^ 
The probability of drawing a black ball— E 
зери »  » . зз White ball then = 10 
» » » „ » black ball then - 
» " 25 » yy White ball then = = 
The probability of the compound event 
—6 10. 5 x 9 AS 


16 15 14 13577728 

The above two events are mutually exclusive, Therefore, the required chance 

that 4 successive drawn balls are alternately of different colours (without mentioning 
the colour with which to begin) 


45r ЕСИ 
Cms "Ung 3 70124. 
Illustration 35, (а) A can solve 90 
B can solve 70 per cent, What is the prob: 
problem selected at random ? 


(b) Tn a single throw of two dice, what is the probability of obtaining a sum of 
at least 10? 


per cent of the problems given in a book and 
ability that at least one of them will solve à 


Solution, (а) Probability that А will not be able to 
solve the problem-1— 2 — 1. 


10 10 
Probability that B will not be able to 


| ANS. 
Solve the problem=1— CIE IUE 


Probability that none of them will be able to 


solve the problem=—- x. 3. — 3 


10 10 100 
Hence the probability that at least one of them will 


Я solve the problem=1— Зала 


(b) Two dice can be thrown together in 6x 6—36 ways. 
We have to find the Probability of getting a sum of at least ten, i.e., either ten 
or eleven or twelve. 


The probability of. getting a total 10 in 
a single throw of two dice (6, 4) (4, 6) (5, == 
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The probability of getting a total 11 in 


a single throw of two dice (6, 5) (5, 6)= —2- 
The probability of getting a total 12 in 
a single throw of two dice (6, 6) = x 
' Since the events are mutual'y ex: he probability of obtaining a sum of 
at least 10 in a single throw.of two dice— —— + 36 T 
= e 
6° 


Illustration 36. An urn contains 8 red, 3 white and 9 blue balls. If 3 balls are 
drawn at random, determine the probability that (a) all 3 are red, (b) all 3 are white, 
(c) 2 are red and 1 is blue, (d) at least one is white, (c) 1 of each colour is drawn, (/) 
the balis are drawn in the order, red, white, blue. 

Solution, (a) Total No. of balls in the urn=8+3+9=20 

3 balls can be drawn out of 20 in ?*C, ways, 

3 red balls can be drawn out of 8 red in *C, ways. 

4. Probability that all the three balls drawn are red 


т LCS iom 
ROGET ORS | 
(5) Probability that all the three balls drawn are white 
Sta nes ge 
2C, ^ 1140 


(c) Probability that out of 3 balls, 2 are red and 1 is blue 
iE, See ctun 4 1259 10691 

20C, 1140 95 
(d) Probability that none of the balls drawn is white 


C, 57 

7. The probability that at least one ball is white 

34 23 

STOTT 

(e) Probability that one ball of each colour is drawn 
LOSCDOCX'C 216; 18 


=1— 


"e, 1140 95 
(f) The probability of drawing first a red ball— x 
3 


then a white ball= 


” » » » 


blue ball—- —— 


E E » » 18 


The probability that the balls are drawn in the 
8 3 9 3 
whi ea XS XLI 
order red, white, blue: 20 x 19 X18 95° 


Illustration 37, (i) A class consists of 80 students, 25 of them are girls and 55 
boys, 10 of them are rich and remaining poor, 20 of them are fair complexioned. What 
is the probability of selecting a fair complexioned rich gir] ? 

Solution. Probability of selecting a fair 


complexioned person= 2 = 


Probability of selecting a rich регвоп=-00-=-1- 


= 
4 
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3 à Р 25 5 
The probability of selecting а girl= 80 ^ 16 
The probability of selecting a fair complexioned rich girl 
1 1 5 5 
=— X— = 2 an, 
4 8 16 512 00098 


(ii) Explain why there must be a mistake in the following statement : 


A quality control engineer claims that the probability that a large consignment 
of glass bricks contains 0, 1, 2,3, 4 or 5 defectives are 011, 023, 0°37, 0:16, 0°09 and 
0'05 respectively. (M. Com., Meerut, 1973) 


Solution. We know that probability of happening or not happening of an event 
сап never exceed one. In the question given 


P(A)-- P(B)4- P(C)......... 
7011--023--0737--0:164-0709--0705— 101 

nd hence there is some mistake in the statement given. 
(a) What is the probability of getting only one head in five 


This is greater than one a: 
Illustration 38. 
tosses of a coin ? 
.. (b) The probability that a boy will get a scholarship is 0°90 and that a girl will 
get is 0'80. What is the probability that at least one of them will get the scholarship ? 
Soluti (М.А. Econ. Agra, 1974) 
‘olution : 


(a) The Probability of getting head in a single throw of a coin—à 

There are five ways in which one head and 4 tails can appear. 

* The probability of getting only one head in five tosses of a coin 

eS 
C) RET 

(b) The probability that a boy will get a scholarship=0'9 

The probability that a girl will get a scholarship=0'8 

The probability that at least one of them will get the scholarship is 

P(A)+P(B)—P(AB) 
=`9+`8—(9х`8) 
=1'7—'72=0'98. 

Illustration 39, (i) Suppose it is 11 to 5 against a person who is now 38 years 
of age living till he is 73, and 5 to 3 against В now 43 living till he is 78, find the 
chance that at least one of these persons will be alive 35 years hence. 

(B. Com., Bombay, 1974) 


Solution : ` 
The chance that 4 will die within 35 years =L 
5 
ev pL. veru tag » os gr 
Since the events are independent the chance 
LE 11 5 55 
b = co X«L en 
that both will die is 168 28 ^ 
-. The chance that at least one will be alive, [7 
i % эх. „д сч 
both will not be dead —1 128 Tog 79 57. 


line must pass both inspection points before being packaged for shipment. The pro- 


Solution, A defective radio has to pass both inspection points before it is 
packaged. The probability that such a radio will pass the first inspection. point is 
P(A)—-(1—P1)—0'30. The probability of passing the second inspection point given 


- 
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that the defective radio passed the first point, is P(B/A) —(1— P,)—0:20. 
bility that a defective radio will pass both inspection points is, therefore, 
P(A and B)— (1— P4) (1— P; 
= (0°30) (0°20) = 06. 


We conclude that about 6 per cent of the defective radios w 
shipment under the present inspection plan. 


Illustration 41. 

(a) Find the chance of throwing 10 exactly, in one throw with 3 dice, 

(b) Find the chance of throwing more than 15 in one throw with 3 dice, 

(M. Com., Meerut, 1973) 


The proba- 


ill be packaged for 


Solution : 


(a) The number of equally likely cases is 6X6x6—216, 10 can be made up of 
(2, 6, 2) (6, 2, 2) (2, 2, 6) (3, 4, 3) (4, 3, 3) (3, 3, 4) (5, 2, 3) (2, 3, 5) (3, 5, 2), ie, im 
9 ways, 2 

The required chance 216* 

(b) The number of equally likely cases is 6х6х6=216. Throwing more 
than 15 means throwing 15 or 16 ог 17. Now 16 can be made up of (6,6,4) (6,4,6) 
(4,6,6) (5,5,6) (5,6,5) (6,5,5) and the number of these cases is 6. 17 сап be made up 
of (6,6,5) (6,5,6) and (5,6,6) and the number of these cases is 3. 18 can be made up of 
(6,6,6) only and this can occur only in 1 way. Therefore, the number of favourable 
cases is 64-34- 1— 10. 


The required chance =20. =0'046. 


Illustration 42. A coin is tossed. If it turns up heads, two balls will be drawn 
from urn 4, otherwise two balls will be drawn from urn B. Urn А contains three 
black and 5 white balls. Urn B contains seven black and one white ball. In both 
cases, selections are to be made with replacement. What is the probability that urn 4 
is used given that both the balls drawn are black ? .(M. Com., Delhi, 1971) 


Solution. Probability of getting head or tail—1/2. Urn A contains 3 black 
and 5 white balls, i.e., 3--5—8 balls. 


2 balls can be drawn from 8 in *C, ways 

2 black balls canjbe drawn from 3 black balls in *C, ways 
Urn В contains 7 black and 1 white ball, i.e., 8 balls. 

2 balls can be drawn from 8 in *C, ways 

2 black balls can be drawn from 7 black balls inj'C, ways 
Hence the probability that urn 4 is used given that bojh the balls are black 


1 ас, 
2 Хас, 
ТЕРТ uen 
271172 Хас 

ey 
= 5. —о125 
13:08 
56* 8 


Illustration 43. There аге 5 white and 7 red balls in a bag, A ball is drawn 
and then replaced. What is the probability that а white and a red ball are drawn in 
that order? What would be the probability if the ball drawn were not put back into 


the bag? (M. Com., Meerut, 1972) 
Solution : 
Total number of balls in the bag—54-7—12 
= ^ И d ah 
Probability of агае а white ball— 12 
7 


Probability of drawing a red Бап 
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Probability that a white and a red ball are drawn in that order 


When the ball drawn is not replaced : 


In such a case only 11 balls will be left. 
Required probability. x ar ER EA . 

Illustration 44, Two cards are randomly drawn from а pack of 52 cards and 
thrown away. What is the probability of drawing an ace in asingle draw from the 
remaining 50 cards ? (M. Com., Meerut, 1975) 

Solution, The two cards randomly drawn could relate to any of the following 
possibilities : 

(i). Both the cards are aces. 

(ii) None of them is an ace. 

(iii) One of them is an ace and other not. 

The probability in each of these cases will be determined as follows : 

(i) Probability of getting both aces 

“С; 
- Tn, 

Probability of getting an ace of the remaining 50 cards (which contain 2 aces) 

°С, 


Compound probability =C, Хлеб, = 5595 
Li 


зс, р 


Out of remaining 50 cards (which contain 4 aces also) probability of getting 
an ace 
+С, 
ec 


48, 4, 
Compound probability E: x eL 
008 4 36 
1326 " 50 5525 
(iii) Probability of getting one ace and one other card 


Out of remaining 50 cards (which contain 3 aces) protability of getting one ace 
°С; 
= 8C, 
Ev {Cr x#E; ^ зс, 
Compound probability —— up, xar 
22248 3. 48 
1326 " 50 5525 
Since all the three situations are mutually exclusive, the desi ility i 
given by the addition theorem, i.e., si ыыр 
1 376 48 425 1 
5525 15525 + 5525 = 5525 — i3 - 0 O77. 
Illustration 45. There are three urns containing 2 whit k ball 
white and 2 black balls and 4 white and 1 black ball Tespectively. There pv. n 
bability of each urn being chosen. A ball is drawn from an urn chosen at random. 
Find the probability that a white ball is drawn. 


+ 
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Solution. Let А; be the event that ith urn is chosen i—1, 2, 3 and 4;—1/3, and 
B the event that a white ball is drawn. Then 
Р(В)=Р(А1) P (B[41) - P(42) P(BIAs) J-P(A3) P(B/As) 
P( A3) — P(Ag) - P(43) - 1/3 


А 
P(BlA)— £, Р(в/а)= 5 
PUBAs) — 
(ру (027 3.4 
PB=3($+ 2+4) 
ЕЗУ б 
= 3-06 


Illustration 46. А manufacturing firm produces steel pipes in three plants 
with daily producticn volumes of 500, 1,000 and 2,000 units respectively. According to 
past experience, it is known that the fraction of defective outputs produced by the three 
plants are respectively '005, ‘008 and ‘010. Ifa pipe is selected from a day’s total 
Production and found to be defective find out— 

(i) from which plant the pipe comes ? 

(ii) what is the probability that it came from the first plant ? 

(M. Com., Delhi, 1973) * 

Solution, (г) According to Bayes's theorem we have from the problem the 
following events : 

Ay: production volume of first plant—500 units 

A: production volume of second plant— 1,000 units 

Аз : production volume of third plant=2,000 units 

E : a defective item. 

From these events, we see that P(A,/E) is the probability that the item is 
produced by the ith plant, given that the item is defective. Also, P(A;ME) is the pro- 
bability that the items are produced by the ith plant and are defective. Information 
in the problem gives the following probabilities in connection with the random selection 


|. ofa pipe from a day’s total production : 


(a) Prior Probabilities : 


E 300. i eL 
P(A) = 5005100032007 —77 
| 1000 2 
P4) 3509 T 
2720008 4, 
Р(4)= 3500 7 
(b) Likelihoods : 
P(E/A;)=0'005 
P(E/A,)=0'008 


P(E[ A3) =0:010 
(c) Joint Probabilities : 
F 1 Я 
PUDE) - PLA) PIA: =(4-) (0005) 
=0:005/7 


Р(А,ПЕ)=Р(А.)Р(Е|А,) =(+) 0008) 
=0°016/7 
Я 4 yo. 
Р(А,ПЕ)=Р(А)Р(Е|А)= ( 4-)(@о1) 
=0`040/7 
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Sum-—P(E)—ZP(A;)P(E/A;) —0:061/7 
(d) Posterior Probabilities : 
P(AE)- РСА: ПЕ) 0005/7. 5 


Р(А,ПЕ) 0016/7 16 


P(E) —0061/7— 61 


..P(AQE) 0040/7 40 
POE) = ^ pig) —Qosi77 6i 
Since Р(43/Е) is by far the greatest posterior probability, it is then most pro- 
bable that the defective item has been drawn from the outputs of the third plant. Asa 
check on the above calculations, the sum of all the posterior probabilities must be 
unity. 
(ii) Probability that the pipe came from the first plant 


abe 
7% 005 


Ka хоо )+( : x'008 ) А ( 1 хою ) 
0007 


HERES DOD/ АЙ 
="9007-+ 0023+ 0057-00870 0805 


Illustration 47. А product is assembled from there components X, Y and Z the 
probability of these components being defective is respectively 0°01, 0°02 and 0°05. 
What is the probability that the assembled product will not be defective ? 


. . ' Solution, Let A, B and C denote the respective probabilities of components X, 
— YandZ being defective. We are given 


Р P(A)=0°01, Р(В) = 0:02, P(C)=0°05 
* P(A or Bor C)=P(A) and P(B) and P(C)—P(AB) 
—P(BC) —P(AC)4- (ABC) 
* —0*01--0:02-1-0705—0'0002 — 0:0010— 0:0005 --0*0CC0 t 
='07831 
Hence the probability that the assembled product wiil not be defective 
ы —1—'07831— 0:92169 ог 0:922 


P(A,/E)= 


pn 


Theoretical Distributions 


Frequency distributions are broadly classified under two heads : ` 

1, Observed frequency distributions, and + 

2. Expected frequency distributions. D 

So far we have discussed the observed frequency distributions. Such 
distributions are based on observation and experimentation. For example, 
we may study the height structure or weight structure of the students 
of a class and classify the data in the form of a frequency distribution ав 


follows : 
Weight(in Ib.) No. of students 

90—100 10 

100—110 12 

110—120 18 

120—130 15 

ч 130—140 8 
140—and above 7. 


As distinguished from this type of distribution which is based on 
actual observation, it is possible to deduce mathematically what the © 
frequency distributions of certain populations should be. Such distribu- ВИЩ 
tions as are expected on the basis of previous experience or theoretical 
considerations are known as ‘theoretical distributions’ or probability distri- 
butions, For example, if a coin is tossed we expect that as л increases we 
shall get close to 50% heads and 50% tails. On the basis of this expecta- m 
tion we can test whether a given coin is unbiased or not. Ifa coin is 
tossed 100 times, we may get 40 heads and 60 tails. This is our observa- 
tion. Our expectation is 50% heads and 50% tails. Now the question 
is whether this discrepancy is due to sampling fluctuations or is due to 
the fact that the coin is biased. We cannot say anything about it unless 
weknow the expected behaviour of the coin. It should be carefully 
i noted that the word expect or expectation is used in the sense of an ave- 

. гаре. The fact that the probabilities for both heads and tails are $ does 
not mean that we must necessarily always get 50 per cent heads and 50 
per cent tails—it only means that if the experiment is carried out a large 


number of times we will on an average get close to 50 per cent heads, and 
50 per cent tails. 


Knowledge of the expected behaviour of a phenomenon or, in other 
words, the expected frequency distribution is of great help in a large 
number of problems in practical life. They serve as benchmarks against 
Which to compare observed distributions and act as substitutes for actual 
distributions when the latter are costly to obtain or cannot be obtained at 
ай, They provide decision-makers with a logical basis for making decisions . 
and are useful in making predictions on the basis of limited information 
Or theoretical considerations. For example, the proprietor of a shoe 
Store must know something about the distribution of the size of his poten- 
tial customer feet; otherwise he may find himself with huge stock of . 

-shoes which has no market. Similarly the manufacturer of ready-made 
SM-A—9 77-49 prs d 


- 


з 
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garments must know thessizes of collars for which he expects maxim um 
demand so that he has nó stock of unwanted sizes. Ina similar way, the 
teachers in the school, college or university should: know what they expect 
from the students. It is only then that they would be in a position to 
comment on good or bad performance. 
Amongst theoretical or expected frequency distributions, the follow- 

ing three are most popular : t 
' 1. Binomial Distribution, 

2. Poisson Distribution, and 

3. Normal Distribution. 

These. distributions from the point of view of historical interest as 
well as their intrinsic importance occupy a position in the forefront of 
statistical theory : 


1. BINOMIAL DISTRIBUTION 


The Binomial Distribution also known as ‘Bernoulli Distribution’ is 
associated with the name of a Swiss Mathematician James Bernoulli (1654- 
1705). Binomial distribution is a probability distribution expressing the 
probability of one set of dichotomous alternatives, i.e., success or failure. - 
К More precisely, the binomial distribution refers to a sequence of 
events which possess the following properties : 

- Т. A simple experiment is repeated a number of times where, the 
outcomes are independent, ie., what happens on the first trial does not 
affect the second, and so on. 

s . 2. Outcomes of each trial can be classified into two — mutually ex- 
clusive categories, arbitrarily called “successes” and "failures". 

3. The probability of success in a single trial, denoted by p, remains 
the same for all trials. If the possibility of success is not the same in 
each trial we will not have a binomial distribution. For example, 5 balls 
are drawn at random from an urn containing 10 white and 20 red balls. 
This is a binomial experiment if each ball is replaced before another is 
selected. If the balls are drawn without replacement, the probability of 
drawing white ball changes each time a ball is taken from the urn and 
we no longer have a binomial experiment. p t 

4. In a given trial the focus is on whether or not the successful 
outcomes have occurred. 


5. The experiment is performed under the same conditions for a 
fixed number of trials, say, п. ; 


. This model is useful to answer questions such as this: If we соп- 
duct.an experiment under the stated conditions п times, what is the pro- 
bability of obtaining exactly п successes ? More specifically suppose 10 
dice are tossed together, or one die is tossed 10 times, what is the proba- 
bility of obtaining exactly two aces ? 

6. How binomial distribution arises can be seen from the follow- 
ing: 2 

If a coin is tossed once there аге two outcomes, namely, tail ог head. 
The probability of obtaining a head or p=} and the probability of obtain- 
ing a tail or q=}. Thus (q+p)=1. These are terms of the binom ial 


(4-ЕР). в 7 


ралы 
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Similarly, if iwo]toins are] tossed. simultaneously there are four 


possible outcomes : ` 
“A B 
Т T 
T H 
H T 
, H H 
The probabilities corresponding to these results are : 
> ЕҢ DENT. HH ге 
u as 4 4 2р M 
g? 24р р? 


These are the terms of the binomial (44-р) because 
(4+РУ#=4°--24р-Ер* 
‚ In a special case where p=q=4, we have 
@+@#=4+4+. " 
Similarly, if three coins 4, В and C are tossed the following are the 
8 possible outcomes and the probabilities corresponding to these results 
afe: 


і ABC АВС АВС АВС АВС АВС АВС АВС 


"TIT ТТН THT HIT THH НТН HHT ННН 
Aw a Nu ay EA, ee ч. 


т h ; Г ^x Y : 
ар Gp qp qp Kr y= GD аа DS 
These are the terms of the binomial (q--p)? 
(4-Ер)3=48--34°р--3ар°-Ер8 
where p=q=}, we have 
[-HP-EHEHRH. 
These probabilities can be calculated by direct count also, For 
example, the chance of getting 3 tails in a single toss of 3 coins is}. The 
chance of getting 2 tails (combined with one head) is 2, the chance of 


" 
" 


. getting | tail (combined with 2 heads) is $ and the chance of getting no 


tails is 5. In general in л tosses of a coin the probabilities of the various 
possible events, i.e., (obtaining 0, 1, 2......... п heads) are given by the 
successive terms of the binomial expansion of (g+-p)", which is 

(q-I-p)^ —4" +" Cig p+" Cog” p+... "C, qnp* 4... . p^. 

These terms may be listed in the form of a probability distribution 
table as follows : 
PROBABILITY TABLE FOR NUMBER OF HEADS 


US 


E Number of heads Probability 
x P 
Е x 0 q 
1 "Ciqh*p 
2 nC,qn-3p? 
- 3 nC,qn-3p* 
n C,qn-rpr 
= са 


-= 
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Since by expanding the binomial " (q--p)* we obtain probability of 
v 0, 1, 2.........7 heads the probability distributiom is naturally called the 

binomial probability distribution or simply the binomial distribution. The 
‚ general form of the distribution is 

PTS P(r)—-"C,q'-tp' 

where P(r) denotes the probability of getting exactly r successes. 


* Thus for an event R with probability of occurrence p and non-occur- 
rénce q, if п trials are made probability distribution of the number. of 
occurrences of A will be as set in the above table. It is customary to 
call the occurrence of an event the ‘success’ and non-occurrence the 
‘failure’. 

. f we want to obtain the probable frequencies of the various out- 
comes in N sets of n trials, the following expression shall be used : 


Ма+р)* 
N(q- p — N(q"-" Cg p+ "Cag" p+ c EPCq7t p! ... +P") 
The frequencies obtained by the above expansion are known as 
expected or theoretical frequencies. On the other hand, the frequencies 


- .actually obtained by making experiments are called actual or observed 


frequencies. Generally, there is some difference between the observed 
and expected frequencies but the difference becomes smaller and smaller 
as N increases. ; ' 
It should be noted that the variate in the binomial distribution is a 
discrete one and not continuous, i.e., the number of successes (x) takes 
only integral values. 
Obtaining Coefficients of the Binomial 
> For obtaining coefficients and exponents for any power of the bino- 
mial, the following rules may be remembered. To find the terms of the 
expansion of (94-Р)". 
yok. The first term is q". 
‚2. The second term is nq™ ?p. L- 
= 3. In each succeeding term the power of gis reduced by 1 and the 
power of p is increased by 1. 
ww. 4. The coefficient of any term is found by multiplying the coeffi- 
cient of the preceding term by the power of q in that preceding term, and 
dividing the product so obtained by one more than the power of p in 
that preceding term. ч 


w^ 


» When we expand (q+7)", we get E, 
` (а+р)"=4"%--"С,"-1р--"Сд%р?-+-.........р" 
where 1, *C;, "C,......... are called the binomial coefficients. Thus in the 


expansion of (q--p)* we will have 
49-р)°--4°--54%р--104%р°--104р3--59р*+-р° 

and the coefficients will be 1, 5, 10, 10, 5,1. 
From the above binomial expansion, the following general relation- 


ships should be noted : А = SOR 
he l. The number of terms in a binomial expansion is always n-- 1l. 


+ 
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2. The exponents of p and £ for’ any single term; when- added 


together, always sum to 7. % ч 

3. The exponents of q are т, (n— 1), (п—2).:#...... 1,0, respectively, ғ 
and thë exponents of p are 0, 1, 2......... (n—1), n, respectively, (note: 
р°=1„; q*—1). ; 


^4. The coefficients for the n+1 terms of the distribution are always 


symmetricalascending to the middle of the series and then descending, 
when л is odd number, n-+1 is even and the coefficients of the two central 
terms are identical. х 
The coefficients of the binomial expansion can very conveniently b 
obtained from the Pascal's* triangle given below : * 


PASCAL'S TRIANGLE 

Numberin Binominal Coefficients Sum 
sample n “» 
i 3 
2 1 4 

3 1 1 8 

4 1 PA А 16 

5 Ф 1:.-57230.-40- 5-4 32 
.$ 1 765 153159007 155406 aL 64 

? 7 1 12,01 су S| 7 1 128 
8 1 8 28 56 70 56 28 1 256 = 

9 1 9 36 84 126 6 8 3i 9 1 512 

10 1 10 45 120 210 252 210 120 45 10 i 1.024 


Ў Inspection of the above table will show that each term is derived by 
adding together the two terms in the line above which lie on either side 
ofit. Thus, when n=6, the fourth term 20, is found by adding together 
the terms 10 and 10 in the line for п=5. The expansion yields terms 
which form a symmetrical distribution, because the probability of the 
events р and д are equal, that is, p—g—'5. Sd 

# It should be noted that the binomial distribution is symmetricalif , 
p=0'5 and skewed if p#0'5. When p<0'5, is skewed to the right, when 
p>0'5, it is skewed to the left. The interchange of p and g in any binomial 
distribution yields its mirror image. The skewness of the binomial dis- 
tribution irrespective of the size of p, becomes less pronounced as" 
increases. * ул 
Properties of the Binomial Distribution 

1. The shape and location of binomial distribution changes as p 

. changes for a given n or as п changes fora given p. As p increases fof 

a fixed л, the binomial distribution shifts to the right. This is demons- 
- trated in the diagrams (page A-2°6) where a series of line graphs have 
been*prepared for п=6 and p—0:10, 0°20, 0°30, 0:50, 0°80 and 0°90. б 
.32. The mode of the binomial distribution is equal to the value of x 
which has the largest probability. For example, if п=6 and p—0'3, the  . 
mode is equal to 2. While for n=6 and p—0'9 the mode is equal to 6., 
The mean and mode are equal if zpisan integer. For example, when 
n=6 and р=0:50, the mean and mode are both equal to 3. For fixed n, 
both the mean and mode increase as p increases. H 
3. As л increases for a fixed p, the binomial distribution moves to 
the right, flattens, and spreads out. The mean of the binomial distribu- 


* Blaise Pascal wasa famous French Philosopher and a Mathematician. 
> p 
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6 . =] 
tion, пр, obviously increases as л increases with р held constant. E 
larger п there are more possible- outcomes of а binomial experiment an 
the probability associated with any particular outcome becomes smaller. 


BINOMIAL PROBABILITY DISTRIBUTION FOR N=6 


0-6 
! 2-01 р=0:9 
о | t 
beat 
e2- 
mus y 
0 345 


P-0-8 


"——T 


OF К< 23d 56 0123456 


4. If N is large and if neither p nor q is too close to zero, the bino- 
mialdistribution can be closely approximated by a normal distribution 


with'standardized variable given by z NES * The approximation 


becomes better with increasing N, and in the limiting case is exact. 
Constants of the Binomial Distribution 
The mean of the binomial distribution is np and standard devia- 
tion 4/npd T 
Y Proof. If pis the probability of success and 9 the probability of failure in one 
trial then in n independent trials the probabilities of 0, 1, 2, 3,-.....n successes are given 


N4 the Ist, 2nd, 3rd,...... n-lthterm of the binomial expansion (g-+p)". Thus we 
Ive 
E Ри к. xpi) 
0 q^ 0xq^ 
1 "Cg" : EXT p 
-272 л\п—1) n-ap? 
s RU P 2x1 qup 
Н o TENE 
n р" пр" 
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г В UNE LL Ex.p(x) 
The arithmetic mean by definition is — X(px) 
3x p(2) 0g ongn-2p 2 0— guages... np 


—ngqnp--n(n—1) 204 "m пр" 
Taking пр common 
=nplg" + (n— 1)gn7p4-......-pn1) 


-—np(gq4-p) [since the expansion in brackets is the 
expansion of the binomial (q--p)"-] 
=np(1)"=np C^ 4p—1] 
Thus Zx.p(x)—np ('." the sum of probabilities 1) 


Thus the mean of binomial distribution is np. 
The standard deviation of binomial distribution is 4/npq. 


Proof. с? or us—vs—v;? (where v; and уг are moments about origin, zero) 
и={х°.р(х)} 
уу=пр 
2, — = 
Bx? p(x) m0 qn нта + ЛЕ) guage MOD) (02) 
l aie +Еп°рк 
апп 1g tp З 002) gy 
Ч.лар" 
=nplar2+2(n—1)gr-tpt4 3070019 cap, see npn-t 


Breaking second, third and following terms into parts, we get 


Zx* p(x) e pl(gn714- (n— 1)gn-2p + DUP qu-sps p... par) 
2x1 
пер + 2020 2. gene sc (n pet) 


=nplat+p("+ (n—1)p{qr + (n—2)g^73p4-......... pna 
=npll+(n—1)p(q+p)"] 
(9+р)"—1=1 and the expression to the right is expansion of (g¢+p)""=1) 
—np[14- (n—1)p. 1] 
—np[1--np—p] 
=пр=п?р?—пр% 
Ma ys—v;* 
(0 =np+n*p*—np*—(np)* 
—np--ntp? — np? —nip* 
=np—np* 
=np(1—p) 
=npq Г. (1—p)=q) 
409 ог Vus V/npq 
: Thus the standard deviation of binomial distribution is V пра or the variance 
s прд. 
In a similar manner as above we can show that for a binomial 
distribution 


Dri ра(4— р) and 
u,—-3r'p'g? J-npq(1 — 6pg) 


In case of theoretical distribution s (рх) is always 1. 
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From the various. moments the valu& of 8; and f, can also be 
computed е. К> " К 
ву _1р'(9—р) = (q—p* 

tow) D "pq 
Tf 8;—0 the distribution is symmetrical, 7.e., there is no skewness. Prof. 
Fisher gave the following measure of skewness 
4 uv 
If y,—0 the distribution is symmetrical, if y; is more than zero the 
distribution is positively skewed and ifit is less than zero the distribution 
is negatively skewed. It should be noted that y, is a better measure of 
skewness compared to B1 because B, will always be positive whereas ү, 
сап be both positive as well as negative. 


W Lm p'q*-Enpqd —6pg) , | 1—6pq 
=з mpg Э mq 
Prof, Fisher gave the following measure of kurtosis : 
Үз=8—3 


If the distribution is normal, y, would be zero and y, would also be zero. 
But the converse is not true, ie. even if both v, and vy, are zero the 
distribution is not necessarily normal. If y, is positive, the distribution is 
leptokurtic and if y, is negative the distribution is platykurtic. 

The various constants of the binomial distribution can be listed in 
the following table : 


Mean=np 
» Standard Deviation=/npq 
First Moment or pı=0 


Second Moment or g, =npq 
Third Moment or Us npq(q—p) 
Fourth Moment or 473m p?q*--npq(À — 6pg) 


S А 8,— (4@—р)? 
npg 
23.4 1-600 
£,—-3-- uS 


Importance of the Binomial Distribution 


The binomial probability distribution is a dis ili 
Qo. Ihi 1 D stri crete probabilit 
distribution that is useful in describing an enormous ea af real life 
Ae Gar For example, a quality control inspector wants to know the 
proba ility of obtaining of bad light bulbs in a random sample of 10 bulbs 
c NR ios the ү ШЕ ти defective. Не can quickly obtain the 
newer from tables of the binomial ili istributi 
SAGs A D. Ue E j omia , Probability distributions, The 

l. The outcome or results of ea ial i 

4 ch trial in the process are 

characterized as one of two t i 
OE НЫШЫ о types of possible outcomes. In other words 
. ,. 2 The possibility of outcome of any trial d i 
independent of the results of previous Ma PO ee deos and ie 


The following examples will i icatic nori 
фоль: g i ill illustrate the Applications of binomial 
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Illustration 1. A coin is t68sed six times. What is the probability of obtaining 
four or more heads ? 

Solution, When a coin is tossed the probability of head and tail in case ofan 
unbiased coin are equal, i.e., p—q9-73. 

The various possibilities for all the events are the terms of the expansion 
(а+р)*. 
© (q«-p)* — d'-- 6d p-- 159p *-20g*p?- 15g? p* 4 Gap +p 
Л The probability of obtaining 4 heads is 

15p*g*— 15 x (4)°(4)*=0'234 
The probability of obtaining 5 heads is 

6ар5=6(2) (4)°=0'094 
The probability of obtaining 6 heads is 
p*—(9*—6016 

<. The probability of obtaining 4 or more heads is 

0:234--070944-0016—0'344, 
Illustration 2. Assuming that half the population is vegetarian so that the 
chance of an individual being a vegetarian is $ and assuming that 100 investigators can 


take sample of 10 individuals to see whether they are vegetarians, how many inves- 


tigators would you expect to report that three people or less were vegetarians ? 
(М.А. Econ. Delhi, 1966) 


Soultion, Probability of a person being vegetarian or p =$ 


Ве q—-1—p-1—i-i 
By expanding the binomial 100(3--3)'5, we get the number of investigators who 
are expected to report 0, 1, 2.......-. 10 people who are vegetarians. The number of 


кено» who are expected to report that three people or less were vegetarians is 
given by 


N(q)!94-(N 029p) + (№454%р°) + (N 120g? p*) " 
=100 x (3)19 4-100 10 (3)* (3) 4- 100 45 (3 (3)*--100 x 120(4)7(4)* 
= 100, 1000, 4500 | 12000 
io24 1024" 1024 1024 
..17600 ... t 
EA МЕ > 


Hence 17 investigators would report that 3 or less people were vegetarian. 


Illustration 3. Ifthe probability of defective bolt is 0'1, find (a) the mean and 
standard deviation for the distribution of defective bolts in a total of 500, and (b) the 
moment coefficients of skewness and kurtosis of the distribution. Ы 


Solution, (а) p=0'l, n=500 А 
Меап=лр=500х'1= 50. 


Thus we can expect 50 bolts to be defective. n 
о=\/ пра 
n=500, p—0'1 and 4=0`9 
в=\/ 500х01х0'9 —67 * 


(Б) Moment coefficient of skewness, i.e., Үл 


n-2Vv А 


— azp _ (09-01) _ 08. 
КОСУ СЮРТЕ Sale 
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Since yı is more than zero the distribution is positively skewed. Howe ver the 
skewness is yery moderate. 


Moment coefficient of Kurtosis 
onus "m 
igp c4 OPE" 
£73 npo 
zu 1—6(0 (0:9) 
Ы —34-046 _,., 
=3+ 449 =3`01 
123—301 —3— 4-0701 
Since ү, isp ositive the distribution is platykurtic 
(q--p)*—q'-- 6d p 6g*p*-- 20g? p* -- 15g? p*--6qp*-i-p* 
Illustration 4. The incidenee of occupational disease in an industry is such that 


the workmen have a 20% chance of suffering from it. What is the probability that out 
of six workmen 4 or more will contact disease ? 


Solution. The probability of a man suffering from disease or 


20 
VEN RC 
The probability of a man not suffering from disease, ie, 
q-1—l- 4. 
e 5 


Hence the probability of 6 men, 5 men, 4 men, etc., suffering from disease аге 
в 
terms in the binomial expansion er( +) 
(4-Ер)*=4°--6°р-1-6др?-}-204°р--15д°р*-4-бар5-Ер® 


The probability of 4 or more, i.e., 4, 5 or 6 successes is 
7 15g!p*--6gp5— p* 


NG) HD) 


-15x16 | 6x4 


15625 + 15625 "15625 
==__265_ . 53 
15625 ^ 312 


Fitting a Binomial Distribution 

When a binomial distribution is to be fitted to observe data the 
following procedure is adopted : 

1. Determine the values of 
the other'can be found out by the simple relationship р=(1—4), and q = 
(1—p). When p and q are equal the distribution is symmetrical, for p 
and q may be interchanged without altering the value of any term, and 
consequently terms equidistant from the two ends of the series are equal. 
If p and q are unequal, the distribution is skew. If р is less than $, the 


hen p is more than § the distri- 


2. Expand the binomial (g--p)*. The power n is equal to one less 
than the number:of terms in the expanded binomial. Thus when two 
coins are tossed (n—2) there will be three terms in the binomial. 
Similarly when four coins are tossed (n=4) there will be five terms, and 
$0 on. 

3. Multiply each term of the expanded binomial by N (the total 
frequency), in order to obtain the expected frequency in each category. 
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The following example shall illustrate the procedure : 


Illustration 5. Eight coins are tossed ata time, 256 times. Number of heads 
observed at each throw is recorded and the results are given below.’ Find the expected 
frequencies. What are the theoretical values of mean and standard deviation 
Calculate also the mean and S.D. of the observed frequencies. 


"B Мо, of heads at Frequency No. of heads at Frequency 
a throw a throw 
0 2 5 56 
1 6 6 VW. 1732 
2 30 7 10 
3 52 8 1 
4 67 
Solution, The chance of getting a head in a single throw of one coin is 4. 
Hence p=}, q=}, n=8, N—256. 
By expanding 256 (}+4)® we shall get the expected frequencies of 1, 2.. 
heads (successes). 
M No. of heads (x) Frequency — N Xx ^C,qn-tpf 
E o rw 5 2 256(3\5= 1 
1 256 x"C (IGV = 8 
2 256x ®Ca(#)*(4)°=28 
3 256x*C,(1)*(1)5—56 
4 256 x ®C,(4)*(4)*=70 
Т 256x*C,(3)5(3)*—56 
6 256x *C(3)*(3)* —28 
7 256x*C,(y'(3!— 8 
8 256 x (4)8= 1 
Total =256 
The mean of the above He EURE is np 
—8x1-4. 
The standard deviation is /npg 
-VEXEX8-/2- 1414. A. 
) These are the mean and standard deviation of the expected frequencies. The 
3 mean and standard deviation of the observed frequencies shall be : 
x f d fd fat 
и 0 2 —4 ES КУ "a2 | t 
1 6 -3 —18 54 
2 30 7 —60 = 120 
3 52 —1 —52 52 
4 67 0 0 0 
5 56 1 56 56 
6 32 2 64 128 
7 10 3 30 90 
8 1 4 4 16 
N=256 Efd—16 Zfd*—548 
sen eft 
X=4+ 5р d 
16:1 
=4+ 556 =406 


& SE QE 
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; aa 
£ =, / 38 (316 y 
256 (25 À 


. -4/ri4-0001-4/7137-1462, © 
Illustration 6. The following data show the number of seeds germinating out of 
10 on damp filter for 80 set of seeds. Fit a binomial distribution to this data : 
X: 0 1 2 3 4 5 6 7 8 9 10 
Ef СКУ Жш. y ЫЙЫ иеш 0 0 0 
e Ч " (B Sc., Agra, 1973) 
" Solution. FITTING BINOMIAL DISTRIBUTION 


x y fX 
0 Wifes 226 = =. 3 
} Э 8 

P. 6 

Ter 3 12 36 
4 8 32 
5 6 30 
6 0 0 
VA 0 0 
8 0 0 
9 0 0 
10 0 0 

а ECT 
—80 EfX—174 
ug MA 
X= go 72195 
i 174 
= @ But теап=пр=-ыу= 
ў 174 _„. 
Dy p 300 =0°2175 


LR q= 1—p=0'7825, 
.Hence the binomial distribution to be fitted to the data is 

. = А 80(0778254-02175)1* - 
, The theoretical frequencies are th si i i f 
30078250215)" and are tabulated below: — (6705 іп the expansion 0 


UF 


z х Lv Theoretical frequencies 
COR EN Tal a Aree So 
T ud А 80x(7825)— 69 
1% 80x 10(7825)*(2175)1— 191 
a 80% 45(/7825)*(2175)2— 240 
3 80 x 120(°7825)7(-2175)3=17'8 » 
“4 80x 21007825)*(2175)4— 8'6 
PX. 80x 252(7825)5(2175)5— 29 
7 80Х210(°7825)4(°2175)в—= 07 
80x 120(7825)*(2175)— 01 
: s, 80% 45(°7825)2(2175)в— 00 
Э» 80x 10(7825)(21759— 00 
G [Q0 ^ BOX(2075)- 00 
‘ *» Total =% 
otal 80 9 


Illustration 7. Twelve dice were thrown 4096 ti Ea 
ў 7 E h 4, 5, or 6 spot 
appearing was considered to be a success, whi em wa: a ilu Calcu- 
late the theoretical frequencies for 0, 1, 2 ts xo Mr: zi мое. i 
Solution, There are 4, i ince еї i ideré * 
Notes. apes There are 4,096 trials. Since either 4, 5 ог 6is considered’ a success 
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Р The terms of the binomial (4-Ер)% will give the probabilities of 0, 1, 2... 
successes. — Ф 
Hee ^C n=12, q—À, and p-i. " 


By expanding 4096 (3-1) 
we get frequencies corresponding to 0, 1, 2... 
1., 12 , 66 , 20 , 495 792 , 924 , 792 1 495 
4096( то + 4996 ^ 4096 ^ 4096 1 4096 4096 4096 7 4095 + 4096 
20. 66 , 12 , +d 
+ 2096 + 4096 7 4096 +0) 
fo will denote observed frequencies and f, expected frequencies. The observed 


frequencies cannot be in fraction but the expected frequencies may be in fractions. 
However, they may be approximated to the whole number. 


The results can be tabulated as follows : 


-.12 successes. 


Number of Theoretical Number of Theoretical 
# successes frequencies successes frequencies 
0 1 7 792 
1 12 8 495 
, 2 66 9 220 
3 220 10 *66 
4 495 1 12 
* 5 792 12 " 
924 


Tiüustration 8. The following is the frequency distribution of 128 throws of ` 
seven coins, according to the number of heads : 


No. of heads 0 1 2 3 w*4 s 6 7 Total 
Throws 7 6 19 35 30 23 7 lè 128 


Fit a binomial distribution under the hypothesis that the 'coins are unbiased 
«What is the mean and the standard deviation of the fitted distribution ? * 
M. Com. Delhi, 1971). 


Solution, On the hypothesis that the coins are unbiased, p=q=. 
N=128 and n—7. 
THE probability of 0, 1, 2----..-+ 7 heads will be given by expansion (24-5). * 
(BER)? "ei (9) + "es (3* QD)" 7*6 (0* Q)* "e (D* „ 
-F'cg(3)* (B *4-*es (3) 40*3-*ez(. p 
=(4)[1+7+214+35+35+21+7+4 1} "Ж - 
In order to obtain the frequencies we will have to multiply. each term by N, 
i.e., 128. A 
т 
(3+) —128x -p (1+T+214+35+35+21 +741) Ф 
Thus the expected frequencies are : 
x 0 1 2 3 4 5 6 > 
fe 1 7 21 35 35 21 iTe i 
Mean and standard deviation of this distribution 
Mean of the binomial ditsribution is пр and standard deviation v npg 


Here n=T, p=}, I=? 
Д2 mean=7X}=3'5 and А s " 
Standard deviation -AHxixi-VYl75-132 * 

x П. POISSON DISTRIBUTION “ys 


Poisson distribution is a discrete probability distribution and is very 
widely used in statistical works It was originated by a French mathe- 
matician, Simeon Denis Poisson, in 1837. Strictly speaking, the Poisson 
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distribution is the limiting form of the binomial distribution as n becomes 
infinitely large and p approaches zero in such a way that np=m remains 
constant. - Such situations are fairly common. That is to say, a Poisson 
distribution may be expected in cases where the chance of any individual 
event being a success is small. The distribution is used to describe the 
behaviour of rare events and has been called “е law of improbable 
events", In recent years the statisticians have had a renewed interest in 
the occurrence of comparatively rare events, such as serious floods, 
accidental release of radiation from a nuclear reactor, and the like. 


Proof. In case of binomial distribution the probability of r success is given by 
P(r)="erqn р" 
_ BI (nr +1) 
mat а а аа 


5 qne 
Put p= = 2. q=1-p=1— 
n 
We now get 
P(r)= п(п—1...... 


AMD (7 zy 
(i++), ei (1-525). (- m y Am tend to 1 and (1- т)" n 


"0 i2 йб 


This is called the Poisson probability distribution o: i 
see df p y distribution or more briefly a 
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Here е=2:7183 (the base of natural logarithms) 
^ "m is a positive constant equal to the mean of the distrihution 

r is any positive integer. 

The Poisson distribution is a discrete distribution with a single 
parameter m. As m increases, the distribution shifts to the right. This 
is illustrated in the diagram (page A-214) for 4 values from m=0'3 to 
m-—4 Q0. 

The Poisson distribution can frequently be used to approximate the 
binomial distribution when л is large and p is very small. 

Form of the Poisson Distribution 
Like binomial distribution, the variate ofthe Poisson distribution 


- is also a discrete one, i.e., it takes only integral values. The probabilities 


of 0, % 2... successes are given by the successive terms of the expansion 


T Qs Qm m" 
е (ттт Ез] бше YT Tut +]. 
This can be written in tabular form as follows : 
No. of successes Probabilities No. of successes Probabilities 
TENE М р(х) Амат В р(х) 
oh miem 
0 е 4 41 
1 me-m H ы 
2 тетт И m'em 
e 21 ri 
3 mem 
31 


where e—2:7183 and misa constant called the parameter of the distri- 
bution. m--the average number of occurrences of an event. _ 


The above table gives probabilities. If we want to know the 


„expected number of occurrences for different successes, we have to 


multiply each term by N, i.e., the total number of observations. 


Constants of the Poisson Distribution 

Since p is very small in case of Poisson distribution, the value of q 
is almost equal to 1. The constants of the Poisson distribution can thus 
be easily obtained by putting 1 in place of q in the constants of the 
binomial distribution. "EE. 

The mean of the Poisson distribution is m and the standard deviation 
is /m. The proof of this is given below : 

Proof. The Poisson distribution is given as 


No. of successes (x) 0 1 2 3 
p те" пет mem 
Probabi lity р(х) em ТЇ SCARE 31 


Find the mean and variance, ^ 


а 
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A-216 
CALCULATION OF MEAN AND VARIANCE 
? x р(х) х.р(х) 
0 em 0 
1 mem те" 
2 mem mem 2 
21 2 
mem те" v4 
$ 3t 3x2x1 
р 4 mem me x4 
4! 4x3x2 


Mean- Zx.p(x) 


дот miem 
- трет + ——_ 4 — 
Xxp(x)—-0-Fme-"--m'e" + a T) T 


=те-"( 1+т+ га Tene ) 


=me~™,em™=me°=m 


: ( © «т=1+т+ 
~ 
, Hence the mean of Poisson distribution is т. Ф 
Variance or ga Y vr? 
{where v; and у, denote moment about origin zero) 
ъ= ZG. p(x)} 
T ix Ше хр cu? 
т 
ў ў р(х) x*xp(x) 
0 em 0 
p 1 me^ memm 
2 cte met ua 
7 1a 2! 2 « 
7 тет те" | 
3! 3x2x1 
mtem тет 
E 
41 453х216 
: Ы : 
Ba p(x) =O me-m4 27 708, qne t nts 
= 
тё т? 
е А eee 
(ms PE HADE pee) 


å 
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Breaking each term within brackets into two parts each, we have 


Bxp(x)=me-m{ ( Lem a me 
(n ч ex -)) 
-mev[ етт 1+т+ m] 
zme-m(em--mem) 
=те-т,ет(1+т) 
=me*(i+m) 
=т+т? 
uy y vi? 
=m+n°— (m)? [2 nm) 
=m, 
Thus с? or u, —m and o—4/m. 


Ina similar manner we can show that for Poisson distribution 
p, m and y,—m-4 3n. 
Thus these four moments of Poisson distribution are : 
р=0, p, — m, ug m, p, m--3m. 
From these we can determine the values of f, and fp. 


MIC AMOR 
к= wu m m 

кн UND aec) un 1 
рт т 3% m 


One great advantage of the Poisson distribution is that we need ошу 
the value of mean in order to compute the values of various constants. 
This shall be clear from the following illustration : 

Illustration 9. The mean of the Poisson distribution is 2'25. Find the other 
constants of the distribution. 

Solution, We are given mean or 7-225 

c—/m-/Z25—1'5 

u= 

p=m=2'25 

us—m-—225 

pam 3n? 
=225+3(2'25)? 
=2°25+15°1875 
= 174375 or 1744 app. 

1 1 


fumos 1 


B «34 34-4443 unt 


"The Poisson distribution is widely uscd in queuing theory, For details please 
refer to Statistical Analysis by Chou, p. 708. 


SM-A—9°77-50 
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Role of the Poisson Distribution 


The Poisson distribution is used in practice in a wide variety of prob- 
lems where there are infrequently occurring events with respect to time, 
area, volume or similar units. For example, it is used in quality control 
statistics to count the number of defects of an item, or in biology to count 
the number of bacteria, or in physics to count the number of particles 
emitted from a radioactive substance, or in insurance problems to count 
the number of casualties, or in waiting-time problems to count the num- 
ber of incoming telephone calls or incoming customers, and so forth.* 
Similarly, the Poisson distribution is extremely useful in determining the 
number of deaths in a district in a given period, say, a year, by a rare 
disease, the number of typographical errors per page in typed material, 
number of deaths as a result of road accidents, etc. The Poisson distri- 
bution is also used in problems dealing with the inspection of manufac- 
tured products with the probability-that any one piece is defective is very 
small and the lots are very large. In general, the Poisson distribution 
explains the behaviour of those discrete variates where the probability 


of occurrence of the event is small and the total number of possible cases 
is sufficiently large. 


Illustration 10. Suppose on an average 1 house in 1,000 ina certain district has 
а fire during a year. If there are 2,000 houses in that district, what is the probability 
that exactly 5 houses will have a fire during the year ? 


Solution, Applying Poisson distribution 


—np 
zx 1 
^: п=2000, p— 60 
3 xc Ll 
e np=2000 x 1000 2 
Р()= en A 
Here m=2, r—5 and e—277183 
б -2 
ч Р(у=? ne х2» 


Кес. antilog (2xlog 2:7183)]х32 
5х4х3х2х1 


—Кес. [antilog (2x :4343)] x 32 
120 


— Rec. [antilog (:8686)]x 32 
120 
Rec. [7:389] x32 
má eo UR MAC 
120 
.01852x32 . 
120 
Illustration 11. Ten per cent of the tools produced in a certain manufacturing 
process turn out to be defective. Find the probability that in a sample of 10 tools 


chosen at random, exactly two will be defective by using (2) the binomial distribution, 
(Б) the Poisson approximation to the binomial distribution. 


$ * . Solution; Probability of a defective tool or p=0'1 
*Taro Yamane : Statistics—An Introductory Analysis, p. 556. 


Е 
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(a) When a binomial distribution is used probability of 2 defectives in 10 is 
given by 
IC, CD*(9) 
—071937 or 0'19 
(6) When Poisson distribution is used probability of 2 defectives is given by 
P | emm? 
(2)= 21 


where т=пр=10(0`1) —1. 


In general the approximation is good if 
p«0'1 and т=пр<5. 


Fitting a Poisson Distribution 


The process of fitting a Poisson distribution is very simple. We 
have just to obtain the value of m, i.e., the average occurrence, and cal- 
culate the frequency of 0 success. The other frequencies can be very 
easily calculated as follows : 

N(P)—Ne-^ 


N(P)=NP) x Т" 
NO) NO) x 7 


N(P)—N(P)) х F , etc. 


A ‘goodness-of-fit’ test will confirm whether or not the fit is close 
enough to justify the belief that the distribution is of the Poisson type. 


Illustration 12. The following mistakes per page were observed in a book : 


No, of mistakes No. of times the mistake 
per page occurred 

0 211 
1 90 
2 19 
3 5 
4 0 

325 


Fit a Poisson distribution to the data and test the goodness of fit.* 
(М.А. Econ., Punjab, 1968 ; M.A. Econ., Delhi, 1969 ; 
M. Com., Allahabad, 1972) 


Solution : FITTING POISSON DISTRIBUTION 

x f fX 

0 211 0 

1 90 90 

2 19 38 

3 5 15 

4 0 0 
N=325 ZfX—143 


П 
*For testing goodness of fit, please refer to chapter on X? test and goodness of fit. 
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p. 2X _ 143 
XN roam 
Mean of the distribution or m="44 
(Pee 
-27183-4 


=Ree. (antilog (log 2`7183Х 44)! 
—Rec. [antilog (4343 х '44)) 
— Rec. [antilog '1916] 
—Rec. 1:5552—'6444 
N( Po) ='6444 x 325—209'43 


NU) NUS) x | 


—209'43x'44 
-92'15 


NO) - NO) x 


-9215 x 4 


=92'15 х '22—20727 
NUS) e NU x 


DM "44 

22027 X —- 

—616x'44—2:97 
N(P)-N(P9 ж. 


3 “44 
=2°97x XS i 
=2'97х`11=0'33 
The expected frequencies of Poisson distribution аге: 
x 0 1 S 3 4 
J 20943 9215 20°27 297 0'33 —32515 


Note. A rough check on the accuracy of result is that the total of the expected 
frequencies should be equal to the total of the observed frequencies. For example in 
the above case the total of expected frequencies is 32515 and the observed total is 325. 
The slight difference is due to approximation, 

llustration 13. Ina certain factory turning out optical lenses, there is a small 
chance 1/500 for any one Jens to be defective. The lenses ate supplied in packets of 


10. Use Poisson distribution to calculate the approximate number of packets con- , 


taining no defective, one defective, two defective, three defective and four defective 


lenses, respectively, in a Consignment of 20,000 packets. You are given that €^ 
—`9802. (М. Com., Delhi, 1970 ; М. Com., Meerut, 1973) 
Solution. 
- =10, P=} 
N=20,000, n=10, P—35 
=пр= S X 
т=ир=10 х 300 002 
(P.)=e™ 


==е7%32=`9802 (given) 
N(P,)="9802* 20000= 19604 
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N(P)) - N(P) хт 
—19604 x02 
=392`08 or 392 


N() NO) х 
=392x 92392 or 4. 


N(Ps)=N(Pa) X ^- 


=4х B= 027=0 

N(P4) will also be zero. 

Number of packets containing no defective lens= 19604 

Number of packets containing one defective lens=392 

Number of packets containing two defective lenses=4 

Number of packets containing more than two defective lenses—0. 

Illustration 14. YS3ppose that a manufactured product has 2 defects per unit of 
ocolistinsosoted. Using Poisson distribution calculate the probabilities of finding a 


prodist without any defect, 3 defects and 4 defects. (Given e*=0'135) 
is А f (M. Com., Raj. 1973) 


Solution, Average number of defects or m—2 


етт" 
Uis cen 
P(o)=e-*=0'135 given 
(Pi) (A) xm 
='135х2='27 
(P) - (Р) х -5- 
=27% >=" 
='27х 2 27 
(P5)=(Pa) х 55 
х Cpe 
='27X m 18 


(P) (Ра) х 4. 
=`18х5=`09. 
Illustration 15, Below are given the number of vacancies of judges occurring 
in a High Court over a period of 96 years : 
No. of vacancies 0 1 2 3 Total 
Frequency 59 27 9 1 96 


Fit a Poisson Distribution #їо represent the "frequencies of vacancies per year, 
and without making a significance test, state whether you Жы the observations as 


asreeing satisfacterily with this. (M. Com. Meerut, 1976) 
Solution, FITTING POISSON DISTRIBUTION 
X iF fX 
0 59 Qi: 6105 du 
1 27, 27 
2 9 18 
3 1 3 
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Ру=е-т=е—%=2:7183—* 
=Кес. [antilog (log 2`7183 х `5)] 
=Кес. [antilog ("4343 x `5)) 
= Кес. [antilog '21715) 
= Rec. 1:649— 6065 
М(Р,)=96 x 6065— 58/224 
N(P3)- NUP)X É 
=58'2х'5=29'10 


NUS) - NUS) x- 


А 5 

=291 x 3 
—29'1x'25—T275 
NOS) NOx 


$ 
=7°275 х 3 
-2425x'5-1212. 
Thus the observed and expected frequencies are : 


x fo fe 
0 UT MM EC ater aa 
1 27 291 
2 9 T3 
3 1 12 


Since the difference between observed and expected frequencies is very small, the 
observations agree quite satisfactorily with expectation. 


Ш. NORMAL DISTRIBUTION* 


The binomial and the Poisson distributions described above are the 
most useful theoretical distributions for discrete variables, i.e., they relate 
to the occurrence of distinct events, In order to have mathematical 
distribution suitable for dealing with quantities whose magnitude is 
continuously variable, a continuous distribution is needed. The normal 


distribution, also called the normal probability distribution, happens to 


be most useful theoretical distribution for continuous variables. Many 
statistical data concerniri, 


s g business and economic problems are displayed 
inthe form of normal distribution. In fact normal distribution is the 
cornerstone of modern statistics, 


Тһе normal distribution was first discovered by Demoivre as the 
limiting form of the binomial model in 1733. It was also known to 
Laplace not later than 1774, but through a historical error it has been 
credited to Gauss, who first made reference to it in 1809. Throughout the 
18th and 19th centuries, various efforts were made to establish the normal 


*The term normal js somewhat unfortunate since it suggests that there is some- 
thing abnormal about other types of distribution. Thisis incorrect and the normal 
distribution should be regarded simply as one of the several types of theoretical distri- 
butions.—Griffin ; Statistics, p. 127. 


Nel 
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model as the underlying law ruling all continuous random variables— 
thus the name normal. These efforts failed because of the false premises. 
The normal model has, nevertheless, become the most important 
probability model in statistical analysis. 


The normal distribution is an approximation to binomial distri- 
bution. Whether or not p is equal to g, the binomial distribution tends 
to the form of the continuous curve and when л becomes large at least 
for the material part of the range. As a matter of fact, the correspon- 
dence between the binomial and the curve is surprisingly close even for 
comparatively low values of л, provided that p and g are fairly near 
equality. The limiting frequency curve obtained as n becomes large is 
called the normal frequency curve or simply the normal curve. 


The normal curve is represented in several forms. The following is 
the basic form relating to the curve having unit area : 


bot 4/207 
oon 
where y=the computed height of an ordinate at a distance of x from the 
mean, 
c—standard deviation of the given normal distribution, 
m=the constant 3°1416 ; 4/21 —2':5066, 
e=the constant 277183 (the base of the system of naturai 
logarithm), 
&—(X—X), ie., x is the stated value of the variable expressed as a 
deviation from the mean. 


When we say that the curve has unit area we mean that the total 
frequency N is equated to 1 for convenience in representation and cal- 
culation. To obtain ordinates for a particular distribution the ordinates 
given by the above formula are multiplied by N. The equation to а 
normal curve corresponding to a particular distribution is thus given by 
UNE eg x26 

oy 2r 


y= 


a 


The quantity A. inthe above formula is equal to the maximum 
с T 


ordinate (уц) ofthe normal curve corresponding to distribution of stated 
total frequency № and stated standard deviation c. 
Having y, we may use the following form of the equation of the 
normal curve 
220° 
у=у,ё oh 
Thus the ordinate at any stated distance x from the maximum 
ordinate may be determined by multiplying the maximum ordinate by the 


31953 
quantity ^ [20° In a normal distribution mean, median and mode 
coincide. The maximum ordinate is, therefore, the ordinate at that point 
on the x-scale at which these three identical values fail. The normal 
distribution has been so thoroughly tabulated that few persons ever need 
to make use of its formula. 
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Relation between Binomial, Poisson and Normal Distributions 


The three distributions, namely, Binomial, Poisson and Normal, are 
very closely related to each other. As explained earlier when N is large 
while the probability p of the occurrence of an event is close to zero so 
that g=(1—p) the binomial distribution is very closely approximated by 
the Poisson distribution with m=np. 


Since there is a relation between the binomial and normal distri- 
butions, it follows that there is also a relation between the Poisson and 
normal distributions. In fact it can be proved that the Poisson distri- 
bution approaches a normal distribution with standardized variable 
(x—m) 


ут 
Importance of the Normal Distribution 


The normal distribution has long occupied a central place in the 
theory of statistics. Its importance will be clear from the following 
points : 


l. The normal distribution has the remarkable property stated 
inthe so-called central limit theorem, which asserts that certain statistics, 
mostimportant of which is the arithmetic mean, tend to be normally 
distributed as the sample size becomes large. Thus, if samples of large 
Size, п, are drawn from a population that is not normally distributed, 
nevertheless the successive sample means will form themselves a distribution 
that is approximately normal. As the size of the sample is increased the 
sample means will tend to be normally distributed. The normal distribu- 
tion has become an indispensable part of the theory of sampling. As a 
result the work on statistical inferences is made easier. 


_ This characteristic makes it possible to determine the minimum and 
maximum limits within which the population values lie. For example, 
‘vithin a range of population mean +30, 99°73% or almost all the items 
are covered. 


2. Аз п becomes large the normal distribution serves as a good 
approximation of many discrete distributions (such as the Binomial or the 
Poisson model) whenever the exact discrete probability is laborious to 
obtain or impossible to calculate accurately, 


3. In theoretical statistics many problems can be solved only under 
theassumption of a normal population. In applied work as well, we 
often find that methods developed under the normal probability law yield 
satisfactory results, even when the assumption of а normal population is 
not fully met, despite the fact that the problem can have a formal solution 
only if such a premise is hypothesized. 


, 4. The normal distribution has numerous mathematical properties 
which make it popular and comparatively easy to manipulate. For 
example, the moments of the normal distribution are expressed in simple 
form. The normal curve is reasonably close to many distributions of 
the humped type. If, therefore, we are ignorant of the exact nature of a 
humped distribution, or know the form but find it mathematically intract- 
able, we may assume as a first approximation that the distribution is 
normal and see where this assumption leads us. 


as m increases indefinitely. 


4 
| 


THEORETICAL DISTRIBUTIONS A-2253 

A contemporary statistician W.J. Youden, whose hobby is typogra- 
phy, expresses his admiration of the normal distribution as follows : 

THE 
NORMAL 
LAW OF ERROR 
STANDS OUT IN THE 
EXPERIENCE OF MANKIND 
AS ONE ОЁ THE BROADEST 
GENERALISATIONS OF NATURAL 
PHILOSOPHY. IT SERVES AS THE 
GUIDING INSTRUMENT IN RESEARCHES, 
IN THE PHYSICAL AND SOCIAL SCIENCES AND 
IN MEDICINE, AGRICULTURE AND ENGINEERING. 
ITIS AN INDISPENSABLE TOOL FOR THE ANALYSIS AND THE 
INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION 
AND EXPERIMENT. 
Artistically enough it gives us the shape of the normal curve also. 


Prop erties of the Normal Distribution 

The following are the important properties of the normal curve and 
the normal distribution : 

1. The normal curve is symmetrical about the mean (skewness=0). 
If the curve were folded along its vertical axis, the two halves would coin- 
cide. The number of cases below the mean in a normal distribution is 
equal to the number of cases above the mean, which makes the mean and 
median coincide. The height of the curve for a positive deviation of 3 
units is the same as the height of the curve for negative deviation of 3 
units. 

2. The height of the normal curve is at its maximum at the mean. 
Hence the mean and mode of the normal distribution coincide. Thus for 
a normal distribution mean, median and mode are all equal. 

3. There is one maximum point of the normal curve which occurs 
at the mean. The height of the curve declines as we go in either direction 
from the mean. The curve approaches nearer and nearer to the base but 
it never touches it, ie., the curve is asymptotic to the base on either side. 
Hence its range is unlimited or infinite in both directions. 

4. Sincethereis only one maximum point, the normal curve is 
unimodal, i.e., it has only one mode. 

5. The points of inflection occur at +o. 

6. As distinguished from Binomial and Poisson distributions where 
the variable is discrete, the variable distributed according to the normal 
curve is a continuous one. 

7. The first and third quartiles are equidistant from the median. 

8. The mean deviation is $th or more precisely 0'7979 of the 
standard deviation. 

9. The area under the normal curve is distributed as follows : 

(a) Mean clc covers 68279, area ; 34'135% area will lie on 
either side of the mean. 

(b) Mean +20 covers 95:457; area. 

(c) Mean +30 covers 99:737, area. 
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The following table shows the area of. the normal curve between 
mean ordinate and ordinates at various sigma distances from the mean as 
percentage of the total area, 


AREA RELATIONSHIP 


Distance from the Percentage of 
Mean Ordinate Total Area a 
0'5c 19:146 
Г0с 34134 
I'5e 43:319 
1'96c 4T'500 
200 477725 
2'5с 49:379 
2'5758с 49'500 
3'00 49'865 


Thus the two ordinates at distance 1°96c from the mean on either 

uld enclose 4T 5--4T:5—959/ of the total area, and two ordinates 
at 287586 distance from the mean on either side would enclose 49:5--495 
=99% of the total area. The area enclosed between ordinates at 3c dis- 
tance from the mean on either Side would be 49:865--49:865— 997397 of 
the total area. The Various hypotheses are generally tested either at 5% 


level or at 1% leve] (Le, taking into account 95% and 99% of the total 
area of the normal curve.) 


Conditions for Normality* 


The following four conditions must prevail among the factors affect- 
ing the individual e 


ents that make up a given population, if the distribu- 
tion of observations is to be normal : 


l. The causal forces must be numerous and of approximately equal 
weight. 


2. These forces must be the same over the universe from which the 
Observations are drawn (although their incidence will Vary from event to 


event). This is the condition of homogeneity. 


above the population mean are balanced as to m 
deviations below the mean. This is the conditio; 


Constants of the Normal Distribution 


The mean of the normal distribution is 
Other constants of the distribution are 
Us776*, U.,—0 and 44=364 

В, ог moment coefficient of skewness 


Ж and standard devia tion o. 


Hence normal distribution is perfecti 


B; or moment coefficient of kurtosis 
4 
ш 304 
T= =3; 


[s ot 
жа LM LU 
* FC. Mills : Statistical Methods, pp. 156-57, 


y symmetrical. 
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For a normal distribution the value of f» shall be3. If the value 

of B, is more than 3, the curve is leptokurtic and if the value of B, is less 

than three, the curve is platykurtic. 

Area under the Normal Curve 

The equation of the normal curve gives the ordinate of the curve 
corresponding to any given value of x. However, we are usually interes- 
ted in areas under the normal curve instead of. its ordinate. The area 
under the curve gives us the proportion of the cases falling between two 
numbers or the probability of getting a value between two numbers. The 
areas under the normal curve are tabulated and are shown in the Table 
at the end of the book. 

Before discussing the use of the table it is necessary to understand 
the meaning of the normal curye in its standard form. The equation of- 
the normal curve depends on X and с and for different values of X and o 
we will obtain different curves. This would necessitate separate tables об 
normal curve areas for each pair of values of X and р. Fortunately this 
“impossible” task is not required. We will be able to determine norma! 
curve areas regardless of Х and џ by tabulating only the areas under the 
normal curve having X—0 and o—1. 

Such a normal curve with 0 mean and unit standard deviation is 
known as the standard normal curve. 


Sa le ce 
Xem. Xe20  X«30 


x30 X-20 ХС 4 

: ] | i | | Ix- SCALE 
L U M г Н Оч 4 

-3 -2 = Ü f 2 3 

i ! I -68-27% --->4 1 : 

i Ree piae es cis |Z-SCALE 
Velo Ae ато оу р creer 


The normal distribution. 

А normal curve with mean Х and standard deviation c can be con- 
verted into a standard normal distribution by performing the change of 
the scale and origin as indicated above. In the original scale (the x-scale) 
the mean and standard deviation are X and c; in the new scale (the 
z-scale) they are 0 and 1. The formula that enables us to change from the 
x-scale to z-scale and vice-versa is — — 

X-X x 


Za 


с 
where x=(X—X). 
This transformation from X to z is named as z transformation and 
has the effect of reducing X to units in terms of standard deviation. 


In using the normal curve in practical work in the ways described in. 
this chapter we must be sure that the distribution approximates normality 
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or tends towards normality. It must be remembered that *normal" does 
not mean the general type of distribution to be expected. Rather 
"normal" means the type that frequency distributions of certain variables 
tend toward when there are a large number of items. If there is good 
reason to suppose that the distribution of a given variable would approach 
normality if there were a greater number of items, then the principles that 
hold for a normal curve can prove of great value in analysing the distri- 
bution deviations. In other words, given a value of X, the corresponding 
z value tells us how far away and in what direction X is from its mean u 
in terms of its standard deviation с. For example, z—1'8 means that 
value of X is 1°8 c to the right of u. Similarly z=—2°2 means that the 
particular X value is 2:2 с to the left of u. 

Since the various frequencies with which the values of a variable 
occur in the population can be expressed as populations by taking 


E 
TN? 
3j ХЕ: М 
а =“ M 
and since EP Nowa 


we can make the area under the curve corresponding to a normal distri- 
bution equal to unity, regardless of the particular number of observations 
involved. We thus have a normal distribution that is independent of 
N, X and c. „The normal distribution in this form is called the unit nor- 


Where z= x-8) 


The area under this curve is equal to 1. The curve is also called the 
standard probability curve. As X increases more and more, y becomes 
smaller and smaller without ever becoming equal to zero. This means 
that | the curve approaches the horizontal axis but never touches it. 


түүн 
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Entries corresponding to negative values of z are unnecessary because 
the normal curve is symmetrical. As a result the probability that z is 
between, say, —1'1 and zero, is equal to the probability that z lies between 
zero and 171. In other words, p (0&z«120)—p(— 1'20€z« 0). 


The following are some of the examples to illustrate how tables аге. 
to be consulted in order to obtain area under the normal curve : 

Illustration 16. Find the area under the normal curve for 2=1°54. 

Solution, If we look to the table, the entry corresponding to 2 = 1'54 is 


Rein ent this measures the shaded area in the following figure between z=0 and 
z=1'54. 


The table given at the end of the book does not contain entries corresponding 
to negative values of z. But since the curve is symmetrical, we can find the area 
between 2—0 and 2= —1'54 by looking the area corresponding to 2—1754. 


If we wish to cut the area under normal curve to the right of a positive value of 
z, we should subtract the tabular value from 05000. The reason is that the normal 
curve is symmetrical, the area to the right of the mean is 0500 and the area to the right 
of a positive value of z is 0:5000 minus the tabular value given for z. 


Illustration 17. Find the area to the right of z—0'25. 


Solution. Subtract 070987 (the entry given in the table for z=0'25) from 05000 
getting (05000 —0'0987) —0'4013 as shown below. 


1f we wish to find out the area to the left of a positive value of Z, we add 05000. 
to the tabular value given for z. 


Hüustration 18. Find the area to the left of z—1'96. 


Solution, This is the shaded area shown below, i.e., equal to 
(04750-05000) =0°9750. 
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In some problems we may be interested in determining area between ея 
values of z. If both 2'ѕ аге оп the same side of the mean, i.e, if they are both 
positive or both negative, the area between them is given by the difference of their 
‘tabular values. For example, the area between 2=0°60 and 1°80 is (0`4641—0°2257) = 
102384 as shown below. 


If the two 2’s are on the Opposite sides of the mean the area between 
them is given by the sum of their tabular values. For example, the area 


between z— —0°4 and z—0'6 is (01554-02257) —0:381 1 as shown below. 


К At times We may be 
asked to determine the 


1715 dat nee 19. А normal curve has ¥=20 and o=10, Find the area between 


Solution : nest =X _15—20 


40—20 
ау +20. 
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Consulting the table we find the areas corresponding to the z’s are 0'191 
04772 and thus the desired area between x1—15 and x4—40 is (0°1915+-0°4772) or 


as shown below. 
01915 


40 
NE 5 z-70 


Applications of the Normal Distribution 
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The normal distribution is mostly used for the following purposes : 
1. To approximate or “fit” a distribution of measurement under 


certain conditions. 


2. To approximate the binomial distribution and other discrete or 


continuous probability distribution under suitable conditions. 


3. To approximate the distribution of means and certain other 


quantities calculated from samples, especially large samples. 


The following examples shall illustrate the applications of normal 


distribution : 


Illustration 20, Assume the mean height of soldiers to be 6822 inches with a 
variance of 10'8 inches. How many soldiers in а regiment of 1,000 would you expect 


to be over six feet tall ? (M. Com., Meerut, 1974) 

Solution, Assume that the distribution of height is normal, 
Standard normal variate or i-o 
Here X=72 inches 

X=68'22 

c—4/108—3286 

72—6822 _,. 
= ^3086 ^ —]'15. 


$822 72 


Area to the right of the ordinate at 1°15 from the normal table is (0:5000—0'3749) 
ог 01251. Hence the probability of getting soldier above six feet is 01251 and out of 
1,000 soldiers the expectation is 0°1251 х 1000 or 1251 or 125. Thus the expected number 


of soldiers over six feet tall=125. 


Illustration 21, In a distribution exactly normal, 7% of the items are under 35 


and 89% are under 63. What are the mean and standard deviation 


of the distribution ? 


(B.Sc. Math., Delhi, 1973) 
Solution, Since 7% of items are under 35, 43% are between X and 35, Simi- 


larly the percentage of items between X and 63 is 39%. 
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exe 639 
The standard normal variatefcorresponding to 0'43 (43%) is 1°48. Thus 
35 ав „40 
с 
The standard normal variate Corresponding to 0°39 (39%) is 1723 


63- 
с 


= +123 e» 


From (i) and (ii) 
148c— X = —35 
123+ X —63 
On addition of these equations, we get 
Js 2710—28 
Pg 7 BAB i 
т 2757; 71033 
148x1033— = —35 
—X--—35-153 
E X-503. 
Hence the mean of the distribution is 50:3 and o= 10:33. 
Illustration 22, In a normal distribution 31% of the items are under 45 and 8% 
аге over 64. Find the mean and standard deviation of the distribution. 
(.4.8., 1964 ; M. A. Econ., Delhi, 1973) 
- Solution, Let mean be X and standard deviation в; 31% of the items are under 
45. They are lying to the left of the ordinate at Y —45 is 0°31 and, therefore, area lying 
to the right of the ordinate up to the mean is (055—031) =0°19. The value of 2 corres- 
ponding to this area is 0*5 ef 
X—45 .. PAIS 
Hence 30'S аг 350—045 О] 
8% of the items are above 64, Therefore, area to theright of the ordinate at 
64 is 0'08. Area to the left of the ordinate at Х=64 up to mean ordinate is 
(05—0'08) —042 and the value ofz corresponding to this area is 1:4. 


Hence M or 14с=64— Y (йу 


From equation (/) and (ii) E 
Г4з=64—Х _ 
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в=10 
X-0'5x 10=45 
2: X=50 
The mean of the distribution is 50 and standard deviation 10. 
Hlustration 23. The income of a group of 10,000 persons was found to bs 
normally distributed with mean=Rs. 750 P.M. and standard deviation=Rs, 50. Show 


that of this group about 95% had income exceeding Rs. 668 and only 5% had income 
exceeding Rs. 832. What was the lowest income among the richest 100 ? 


Solution, Standard normal variate or 
DUX) 
с 
Hence — X—668, X—750, 2—50 
668—750 —82 1 
Ie a ———=-16 
Area to the right of the ordinate at — 1°64 is (0°4495+4-0°5000) =0'9495. 
2. The expected number of persons getting above Rs. 668 
—10,000 x:0:94950—9495 
This is about 95% of the total, i.e., 10,000. 
The standard normal variate corresponding to 832 is 
p= 832-750 og 
50 50 A 
Area to the right of ordinate at 164 is 
0:5000—0'4495 —0'0505 
The number of persons getting above Rs. 832 is 
10,000 x 00505 —505 
This is 5% approx. 
Probability of getting richest 100 
100 Я 
— 10000. =001. 
Standard normal variate having 0°01 area to its right=2'33 
тр 
2'33= 30 — 
X=2'33 х50-+-750 
=Rs. 866'5. 
IMustration 24. The mean weight of 500 male students in a certain college is 151 


Ib. and the standard deviation is 15 lb. Assuming the weights are normally distributed, 
find how many students weigh (a) between 120 and 155 Ib., and (5) more than 185 Ib, 


Solution, (а) Weights recorded as being between 120 and 155 Ib. can actually 
have any value from 1195 to 155'5 Ib. assuming they are recorded to the nearest pound, 
Standard normal variate corresponding to 1195 Ib. 


2 


119:5—151 j 
ЧЕТЕРЕКЛЕ 
Standard normal variate corresponding to 155'5 Ib. 
Dra PETS le Som 
alrite gen 
Area between z=—2'10 and z=0'30 


=0°4821++0'1179=0'6000. 
.. The number of students weighing between 120 and 155 Ib. 
=500 x 06000— 300 . 


(b) Student weighing more than 185 Ib. must weigh at least 1855 Ib. 
SM-A—1177-51 * 
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Standard normal variate corresponding to 185°5 


Area to the right of z=2'3 is (0°5000—0`4893) =0`0107. 
Number of students weighing more than 185 Ib. 
72500X 070107 —5:35 or 5 app. 
Illustration 25, 1,000 light bulbs with a mean life of 120 days are installed in a 
new factory ; their length of life is normally distributed with standard deviation 20 
days. (i) How many bulbs will expire in less than 90 days ? (ii) If it is decided to replace 


all the bulbs together, what interval should be allowed between replacements if not more 
than 10 per cent should expire before replacement ? 


Solution, (i) ¥=120, «=20, X—90 

Standard normal variate or 

200—120 1. (s 
20 

Area of the curve at (z=—1'5) up to the mean ordinate=0°4332 

Area to the left of —1'5—0:5—0'4332—0'0668. 

Number of bulbs expected to expire in less than 90 days 

7700668 x 1,000—66'8 or 67. 


(й) The value of standard normal variate corresponding to an area 
0°4(0'5—0'1) is 1728. 


z: 


0212021. 

Бола aa 
X—120—128x20 
—120—25'6 
—944—94, 


Hence the bulbs will have to be replaced after 94 days, 


с Probability 
1'00 0:159 
125 0:106 
r50 0'067 (M. Com., Raj. 1973) 
Solution. Average life of bulbs (Y) —1,000 hours 
в=200 hours 


X- burning hours— 700 
z-X-X _ 700—1,000 б 
[ШУ +. rum == 
Area to the left of (—1'5) =0`067 
Number of bulbs expected to fail in the first 700 hours 


—0067 x 2,000—134, 


Solution. (а) Given Y —42, X—50, «—24 


X-X 50-42 . 
lud con 
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Area to the rightlof ordinate at 0'333 is 0°5—0°1304=0`3696 

2. The expezted number of children exceeding a score of 50 
=0°3696 x 1,000—369'6 or 370. 

(b) Standard normal variate for score 30 


X— 30-42 . S 
[isse ар к=. 0'5. 
Standard normal variate for score 54 
-X-X _ 54—42 95 
в 24 A 


Area to the right at 0°5 
=0'5—0'1915=0'3085 
Arealto the left at —0°5=0°3085. 
The probability of having children with score above 54 and below 30 
—03085-4-0'3085 —0'6170 
.'. The probability of having children between score 30 and 54 
=1—0`617=0`383 
Thus the number of children having score between 30 and 54 
=0'383 x 1,000—383. 
(c) Probability of getting top 100 students 
0007 
1000 
Standard normal variate having 0'1 area to the right=1°281 
Standard normal variate for score X 
AX AS 
z= 
с 
к X—42 
1281— —3— 
1281x24—X—42. 
X-—(1281 х 24) +42 
=72'74 or 73. 

Illustration 28. Та acertain examination the percentage of passes and dis- 
tinctions were 46 and 9 respectively. Estimate the average marks obtained by the 
candidates, {the minimum pass and distinction marks being 40 and 75 respectively 
(assume the distribution of marks to be normal). 

Also determine what would have been the minimum qualifying marks for admis- 
sion to a re-examination of the failed candidates, had it been desired that the best 25% 


of them should be given another opportunity of being examined. Ё 
(М.`Сот., Delhi, 1972) 


Solution, (a) Let X be the mzan and в be the standard deviation of the normal” 
distribution. The area to the right of the ordinate at X=40 is 0'46 and hence the area 
between the mean and the ordinate at X—40 is 0°04. 

Now, from the tables corresponding to 0°04, standard normal variate is 0'1. 


a 40-Х -01 
с 


0-04 
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A-0736 
Similarly, 
J5-X 4 
с 
ог 40-X=0'1 c 0 
75— X—134o os (i) 
Subtracting (i) from (ii), 
35=1'240 
"did name 
иы S 2822 
Putting the value of c in equation (i) 
X —3718 or 37. 


Therefore, the average marks obtained by the candidates is 37. 

(b) Let us assume that X; is the minimvm qualifying marks for admission toa 
re-examination. 

The area to the right of X=40 is 46%. 


«^. Percentage of students failing—54 and this is the area to the left of 40. We 
want that the best 25% of these failed candidates should be given a chance to reappear. 
‘Suppose this area is equal to the shaded area in the diagram. 


This area is=25% of 54—13'5 


20:04 


X, ie 


4. Area between mean and ordinate at X. 1 

——(0135—0:04) = —0:095. 
(Negative sign is included because the area lies to the left of the mean ordinate.) 
Corresponding to this area, standard normal variate from the table is 

_ =—00378 

i BX oosre 
г Х1= X—0:0378c 

—37'2— (070378 x 282) 

—372—1'066 

736134 or 36 app. 
Illustration 29. Asa result of tests on 20,000 electric bulbs manufactured by a 


company it was found that the lifetime of the bulb was normally distributed with an 
averagelife of 2,040 hours and standard deviation of 60 hours. On the basis of the 
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information estimate the number of the bulbs that is expected to burn for (a) more thas 
2,150 hours, and (5) less than 1,960 hours. 

Proportion of Area under the Normal Curve 


z Arca z Area z Area 
123 03907 133 04082 1°43 ` 04236 
163 0°4484 1°73 0°4582 1°83 = 04664 


(M. Com., Delhi, 1973) 


A 2150—2040 _ 110 _,. 

= 60 Fes —1833 

Area to the right of ordinate at 1:833 
=0°5—0°4667=0'0333 

The number of bulbs that is expected to burn more than 2,150 hours 
0°0333 x 20,000=666 

(b) X—1960, X—2040, с=60 

1960—2040  .. 

Оет =1:333 

Area to the left of ordinate at 1333 
—05—04082—0:0918 

/. The number of bulbs that is expected to burn for less than 1,950 hours 
—070918 x 20,000 —1,836. 

Illustration 30. (a) 15,000 students appeared for an 


was 49 and the distribution of marks had a standard deviati 
marks to be normally distributed, what proportion of student: 


marks. 


examination. The mean marks 
on of6, Assuming the 
s scored more than 53 


(b) If in the same examination, Grade ‘A’ is to be given to student scoring more 
than 70 marks what proportion of the students will receive Grade ‘А’ 


Solution: (a) 2=- err 
X=55, X=49, а=6 
ЖЕЕ ол 
6 Д 


The proportion of students scoring between 49 and 55 (or between zero and 


on the standard scale) is 03413 
*. The proportion of students scoring more than 55 marks? is 


05—0:3413—071587 or 15°87%. 
_ X- _ 70-49 2! 5. 

® M TU DUE 

The area under the standard normal curve corresponding to 35 

—04998 

Therefore, 0°02 per cent (0*5 —0'4998) —0:0002 would score more than 70 marks, 
Since there are 15,000 candidates, 3 candidates will receive Grade ‘A’. 

Illustration 31, The results of a particular examination are given below in a 


summary form : 


Result % of candidates 
(i) Passed with distinction 10 
(ii) Passed 60 
30 


(iii) Failed 
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Itisknown that a candidate gets plucked if he obtains less than 40 marks (out 
of 100) while he must obtain at feast 75 marks in order to pass with distinction. Deter- 
mine the mean and standard deviation of the distribution of marks assuming this to be 

, (M. Сот. Delhi, 1970) 

Solution. We have to calculate the mean and standard deviation from the given 
information. The following diagram will help in understanding the question and finding 


its solution : 


We know that 30% students get less than 40 marks. 

-. From the table the zfvalue corresponding to 

0'2(20% area)=—0°524 f z= 0°52 for 0`1985 1 
z= 0'53 for 02019 | 
2=0°524 for 0°2000 J 


Hence 


ed ——0524 0 


1075 students get distinction marks, i.e., 75 or more. 
-. From the table the z value corresponding to 
0`4(44% area) —1728 


Hence pm —128 NC 
From equations (i) and (й)_ 
X—40=0'5240 
—X+75=1'2800 
35=1°8040 
. Pe РИТЕР 
z^ сам =194 
40— = —0524x 194 
—X--—1017—40 
X-5017. 
Hence the mean of the distribution is 50'I7Tand standard deviation 19°4, 
FITTINGIAINORMAL[CURVEj 


7А There are two main objects of fitting? a normal curve to sample 
ta: F 


1. To provide a visual device for judging whether£or not the normal 
Curve is а good fit to the sample data, and 


2; „То use the smoothed normal curve, instead of the irregular curve 
representing the sample data, to estimate the characteristics of the 
population. 

Methods of Fitting 

The following two methods are used to fita normalcurve to an 

Observed set of data : 


1. Method of ordinates 
2. Methed of areas. 
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1. Method of Ordinates 

In order to draw a curve on a graph, we need the frequencies which 
are represented on the ordinate and the values of the variable which are 
shown on the abscissa. Hence in order to fit a curve we must know the 
frequencies (ordinates) at the various points of the abscissa scale. While 
fitting a normal curve the ordinates are obtained at various sigma distances 
from the mean. The procedure of obtaining ordinates is as follows : 


The height of the mean ordinate is given by 


y 
die (ated SUM a ше 
739665 ^ 11828 


Thus we need the values of N, X and c in order to fit a normal curve 


to a distribution. In the expression above when (X— X)—90, the exponent of 
— x2/9892 

2771828 raised to the zero power is one. Thus the expression е ха Їз 

always equal to I for the ordinate erected at the mean. The mean ordinate 


can easily be obtained by 
Ni - (Ni 
= 250663 97 Yo= 0399 ( c ) 
y, denotes the ordinate to be erected at the mean and і denotes the 
width of the class interval. 


This is the maximum ordinate of the fitted curve. The height of the 
ordinate at a distance 1c from the mean would be calculated as follows : 


i -1 2 
n=039( XE) „*® 


In a similar manner the height of the ordinate at a distance of 2с 
from the mean would be calculated. 

In practice it is not necessary to compute the ordinates at various 
sigma distances from the mean in the above manner. What is done is that 
only the height of the mean ordinate is calculated in the manner given 
above and the heights of other ordinates are found out from a specially 
prepared mathematical table which gives the ordinates of the Normal 
Probability Curve. (The table is given at the end of the book.) In order 
to consult the table the distance from standard normal variate, i.e., the point 


[ —2 denoted as X and the corresponding value is read. This value 


с 
gives the height of the ordinates as a proportion of the mean ordinate. 


Therefore, to get a value of the ordinate at that point this has to be 
multiplied by the height of the mean ordinate. 

While fitting the normal curve the question is at what points on the 
horizontal scale should the ordinates be erected. Generally the ordinates 
are erected at class mid-points. 

Illustration 32. Ей a normal curve to the following data by (а) the method of 
ordinates, and (5) the method of areas. 


Variable Frequency 
60—62 5 
63—65 18 
66—68 42 
69—71 27 
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Solution, (a) Fitting normal curve by the method of ordinates. 
For fitting the normal curve we need the values of Y and c 


CALCULATION OF AND c 


Variable m f а /4' fae 
60—62 61 5 —2 —10 20 
63—65 64 18 —1 —18 18 
66—68 67 42 0 0 0 
69—71 70 27 1 27 27 
72—74 - 73 8 2 16 32 

nm 4 N-100 XA/—15 Xf'i—97 
sfa 
ш X=A+ NO 


ы. AL x) 
VA 27 -5 x3 
= 09700253 


The mean ordinate’ =0'399 ^". 


N=100, i=3, с=2:92 
0399x100x3 . 
“Лр Н 


Therefore, the height of the maximum ordinate at the point 67°45 (mean) on the 
abscissa will be 41. 


Height of the ordinate at Ic from the mean 


This can bé obtained either from the equation or from the specially prepared 
[mathematical table which gives the ordinates of the normal Probability curve. Use of 


^. Mean ordinate 


=0'60653 x 41—2487 or 25 
Similarly the height of the ordinate at 2c from the mean 
—013534x 41—5'55 
and the height of the ordinate at 327 тот the mean 
—0:0111x41—0'455 
with the help of the mean ordinate 41 and ¥=1c, 
easily be plotted on the Sraph paper. The Points to. be plotted on the left-hand side of 


the mean ordinate are 0:455, 5°55 and 25 and thi ints on the right-hand side of 
mean ordinate are 25, 5'55 and 0455, Epon S e гы, 


‚ The following table shows how to Obtain the height of ordinates at mid-points 
of varieus class intervals : 


=r 
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(X—-X) xs Proportionate Height of 
x 


Variable Mid- points, c ordinate, i.e., ordinate 
Жүз or 
A( в ) Expected 
frequency 
(1) (2) (3) (4) (5) (6) 
60—62 61 —6'45 2'209 008679 3'560r 4 
63—65 64 —304 1182 049750 2040 or 20 
66—68 67 —0`45 0154 0798820 40°52 or 41 
69—71 70 +2`55 0873 0768313 28'01 or 28 
72—74 73 +255 1`900 0`16448 6740r 7 
N=100 


Steps in Computing Height of Ordinates 
The steps in the computation of heights of ordinates are : 
1. Find the mid-points of the various classes. (Col. 2) 


2. Take the deviation of each of the mid-points from the mean, i.e., 
obtain (Y— X) and denote this column by x. (Со. 3) 


3. -Divide each value of x by c, i.e., obtain I (Col. 4) 


4, Calculate the height at distances corresponding to = from the 


table giving ordinates of the normal curve. (Col. 5) 


5. Multiply each figure of Col. 5 by the value of mean ordinate 
height. The resultant figures are the values of the heights of ordinates at 
various distances from the mean, or the expected frequencies of the normal 
distribution at various class mid-points. (Col. 6) 


2. Method of Areas 

The area under the normal curve represents the total number of 
frequencies. Tables have been prepared which give areas under the 
normal curve. These tables give the proportion or percentage of area 
between an ordinate erected at the mean and anordinate at a givem 
distance from the mean in standard deviation units. 

The mean of the distribution is located at the centre of the distri- 
bution and it cuts the curve into two equal parts. Mean --1с on either 
side of the mean covers 68°27 per cent items. Mean +2c covers 95°45 per 
cent items and mean +30 covers 99°73 per cent items. When the method 
of areas is used to fit the normal curve we obtain areas (frequencies) within 
the various class intervals. 

The method of areas has been applied to the problem of Illustration 
32 to obtain the expected frequencies of the normal distribution. 
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FITTING THE NORMAL CURVE BY THE METHOD OF AREAS 


Class Lower Observed (X—6745) x Area Area Expected 
limits class — frequency s Nest under Sor frequency 
limits normal each 
curve class 
from 
0102 
(1) (2) (3) (4) (5) (6) (7) (8) 
595—625 595 5 =795 -—2"2 04967 00413 4120r 4 
625—655 625 18 —495 —170 04554 02068 — 20'68 or 21 
655—685 655 42 —2:95 —0`67 E» 033892 38'92 or 39 
oo 
68'5—71'5 685 27 +105 +036 071406 02771 2771ог 28 
715—745 7y5 8 +405 +139 04177 00743 743 ог 7 


745 +705 +241 04920 


Steps. The method of computing the expected frequencies is given 
below : 


(i) Column (1) gives the class intervals of the series to which the 
normal curve is to be fitted. 


Gi) Column (2) gives the lower class limit of the various class 
intervals. 

(її) Column (3) gives the actual or observed frequencies. 

(v) Column (4) gives the deviations of the lower class limits from 


the mean, ie, (X—X). Thus for the class interval 59:5—62:5 we have 
the value (59:5— 67:45) ог — T'95, and so on. 


(у) In Column (5) the deviations are expressed in standard deviation 
of the series. 


(vi) Column (6) has been derived from the table of areas under the 
normal curve. For example, corresponding to ==) the area under 


the normal curve is 04967, 


(vit) Column (7) is a column of differences. It gives areas under 
the normal curve between Successive values of z, These are obtained by 
Subtracting the Successive areas in the sixth column when the correspond- 
ing z's have the same Sign, and adding them when the z's have opposite 
Signs (which occurs only once in the table). 


(viii) Column (8) gives the expected frequencies which have been 
obtained by multiplying the relative frequencies obtained in Col. (7) by 
the number of observations, i.e., 100 in this case. 


Illustration 33, The following table gives the distribution of height of first year 
students of a college : 


Height in inches 61 62 63 64 65 66 67 68 69 70 71 72 73 74 
Frequency 2 10 11 38 57 93 106 126 109 87 75 23 9 4 


ч Test the normality of the distribution by comparing the proportion of cases 
lying between Yt, Y-L2, -E30 for the distribution and for the normal curve. 


>» 
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Solution : CALCULATION OF X AND = 
X f d fd ft 
61 2 E; Em 72 
62 10 -5 —50 250 
63 11 -4 —44 176 
64 38 =e -114 342 
65 57 =2 =114 228 
66 93 E —93 93 
67 106 0 0 0 
68 126 1 126 126 
69 109 2 218 436 
70 87 3 261 783 
71 75 4 300 1,200 
72 23 5 115 575 
B 9 6 54 324 
7 4 7 28 196 

N=750 21—615 Efd?=4,801 
ENT. E Ес ; 
X-4 ty 
КЕТҮ 
=67 +52 =679 


e=, / (FF 
N N 

.,/59 _ (615)? 
750 X750, 


74/64—81— V/ 559—236 
X--10—67:9-1-2/36— (6554 and 7026). 


Number of students having height in the range +10 
=93+106+4+126+109+87=521 


SACER 
Proportion—755 069—697 


X k22—679--46— (633 and 7275) 
Number of students having height in the range X--2e 
=11+38+57+93 + 106+ 126+ 109-87 +75 4-232725. 


ad Proportion = 75-096 =967 
X-:3e—67:94-69—(61 and 748). 
Number of students having stature in the range Y+3o=750 and hence 100%, 
In a normal distribution the proportion lying between these limits is 687%, 95% and 
99%, respectively. Hence the given distribution is approximately normal. 
MISCELLANEOUS ILLUSTRATIONS 
Illustration 34, (a) Bring out the fallacy, if any, in the following statement : 
(a) The mean of a binominal distribution is 15 and its standard deviation is 5. 
(b) Find the binomial distribution whose mean is 6 and variance 4. 


Solution : 
(a) Mean of binomial distribution is лр and standard deviation A/npg 


Given пр=15 and4/npq—5 
Since 4/npq—5, прд=25 
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We know that p+q=1. Hence 4 сап in no case exceed 1, But in this question q is 
greater than 1 (1°67) which is not possible. The statement is, therefore, wrong. 
(b) Weare given: 
np-—6...(i) 
npq=4...(ii) 
Dividing (ii) by (i), we have 
472.2 =e z£d 
q= Е and p=1 3 3 
Substituting the value of p in (i), 
nj-6 
n=6X3=18 
Hence the required binomial distribution is 


(q+p)"= ( 2+ x) 


Illustration 35, If the probability of a defective bolt is 0'1 find (a) the mean, 
and (5) the standard deviation of defective bolts in a total of 900. Also calculate 
skewness and kurtosis, 

‘Solation, 

(а) Mean=np=900(9'1) =90, i.e., we can expect 90 bolts to be defective 

(6) Variance=npq=900 (0°1) (09) —81 

Hence the standard deviation= 4/81—-9 


Skewness = = 


Kurtosis  =34. 1—64 


234 1~6(0'1)(0'9) 
SSS gee 
=3+-0'006=3'006 

The distribution is slightly Leptokurtic, 

Mlustration 36. The probability that an evening college student will graduate 
is 0'4, Determine the Probability that out of 5 students (а) none, (b) one, and (c) at 
least one will be graduate, (B. Com., Bombay, 1974) 

Solution, Here p-04, g=0°6, Applying the Binomial distribution 

Jr) ^ ng pr 
(a) Probability"that none will graduate 
-56,x04*065 ; 
=0°0777 or 008 

(b) Probability that one will graduate 

=5С1х04х06* 
=0`2590 or 0°26, 

(c) Probability that at least опе will graduate—1— (probability that none will 
graduate) 

—1—0'08—0:92 

Illustration 37. Ox: of 800 families with 4 childern е h, what percentage would 
be expected to have (a) 2boys and 2 girls, (b) at least one не (с) no girls, and (d) at 


the most 2 girls. Assum: equal probabilities for boys and girls, 


Á 
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Solution, (i) Probability of getting a boy—1, i.e., p=} 


» »  » Bil-iLeq—i 
n » getting 2 boys and 2 girls 
=n, q" "p" 
Cr 
='C3(2)°(4)? 
Jo e: vM iege 
4 16 8 
Percentage of families expected to have two boys and two girls 
= 4 x100=375% 


(ii) Probability of getting at least one boy means the sum оѓ th babiliti 
getting one boy and 3 girls, 2 boys and 2 girls, 3 boys and 1 girl and 4 Ape 
pr(1 boy and 3 girls) ‚ар qn-rpr 
4 
=4 Ы I= imn 
CH)? Q) OE 
pr(2 boys and 2 girls)=Ẹ [see part (i)] 
pr(3 boys and 1 girl) =*C3(4)! (3)*—1 
pr(4 boys and no girl) e Qe LL 


Г. Probability of getting at least one boy 


1 3 1 15 
=H 358 0161016, 
Percentage of families expected to have at least one boy 
md. x100—93757, 
16 
(iii) Percentage of families expected to have no girls, i.e., all boys. 
1 


16: 100—625 
(iv) Probability of getting at most 2 girls 
1 1 3 11 
пета 73163 
Percentage of families expected to have at most 2 girls 
11 mM 
"16 X100—68775. 


Illustration 38. The normal rate of infection of a certain disease: in animals igs 
known to be 25%. In an experiment with 6 animals injected with a new vaccine it was. 
observed that none of the animals caught infection. Calculate the probability of the. 
Observed result. 

Solution, Let p denote infection of the disease 

25 1 miner E; 
р ggg and df 
Out of 6 animals probability of 0, 1, 2, 3, 4, 5 and 6 animals catching infection will De: 


5 д 3:9 SpA 
given by Ist, 2nd, 3rd terms, etc. in the expansion of (i a): 
The probability of none of the animals being infected 


ГУЗ: y- 729 
-( 4 J ~ 4096" 
Illustration 39. A cigarette company wants to promote the sales of its 
rtising campaign. Fifty out of every thousand ` 


cigarettes (Brand x) with a special adve a ] housa 
cigarettes He rolled up in gold foil and randomly mixed with the regular (special king- 
sized, mentholated) cigarettes. The company offers to trade a new package of ~ 
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cigarettes for each gold cigarette a smoker finds in a package of Brand x. What is the 
probability that buyers of Brand x will find X—0, 1, 2, 3.--gold cigarettes in a single 
package of 10 ? 

Solution. This can be considered a Poisson experiment in which there are 
n=10 trials (in a pack) and the probability of finding a golden cigarette (a success) is 
‚р=50/1000=0`05. The expected number of golden cigarettes per pack is, therefore, m-—np 
=10(0'05). Using the table of the Poisson distribution, we obtain 


Number of golden Probability 
Cigarettes per pack 
0 06065 
1 0:3033 
2 00758 
3 00126 
4 00016 


We find that 60°65 per cent of the packages contain no golden cigarettes, 30°33 
рег cent contain one golden cigarette, 7°58 per cent contain 2 golden cigarettes, 1'26 
per cent contain 3 golden cigarettes and 0°16 per cent contain 4 golden cigarettes. 


Illustration 40, How would you use the normal distribution (о find approxi- 
‘mately the frequency of exactly 5 successes іп 100 trials, the probability of success in 
‘each trial being p—0'1. 

Solution, Let z-number of trials, p—probability of success and q—the pro- 
bability of failure. Hence л= 100, p—0'] and4—0'9. Itisacase of binomial distri- 
‘bution. For binomial distribution 

mean=np and S.D.— ^/npq 
Hence mean-np-— 100 x '01—10 
5.р.=ү100х01х09=3 
We know that when number of trials is large, binomial distribution tends to normal 
distribution. Normal distribution isa continuous distribution. Frequency of exactly 
5 trials of binomial will correspond to the frequency of class interval 4'5 to 5'5 and 
standard deviation of binomial distribution will correspond to mean and standard 
«deviation of normal distribution. 
Standard normal variate corresponding to 4'5 
"E mp ESTO L _ 193 
vnpq 
Standard normal variate corresponding to 5°5 
5'5—пр 55—10 
-— = =—150 
Мпа 3 
Area below the value— 183 from the table— 070336 
n. o5 o» » 150 ,  ,, 7070668 
Area between the two=0'0668—0'0336=0'0332 


The frequency of class interval 4'5 to 5:5 
=N x00332 


-. Approximate frequency of exactly 5 successes in 100 trials with p—0'1 
7070332 x 100—332. 


Illustration 41. You are incharge of rationing in a State affected by food 
‘shortage. The following reports were received from investigators : 


Daily calories of food available per adult during current period 


Area Mean S.D. 
A 2,000 350 
B 1,750 100 


"The estimated requirement of an adult is taken at 2,500 calories daily and the absolute 
‘minimum of 1,000. Comment on the reported figures and determine which area in 
-your opinion needs more urgent attention. 
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Solution, 
Area А Area B 
Mean--3e Mean--3e 
2,000-- (3x 350) 1,7504- (3x 100) 


Between 3,050 and 950 calories Between 2,050 and 1,450 calories 


Since the estimated requirement is a minimum of 1,000 calories, area A needs 
more urgent attention because there are people here who are getting less than 1,000 
calories. 


Mlustration 42, Eight coins are thrown simultaneously. Find the chance of 
obtaining 
(i) at least 6 heads, 
(ii) no heads, and 
(iii) all heads. (M. Com. Meerut, 1974) 
Solution, (/) In tossing 8 coins simultaneously, the probability of getting 


at least six heads will be gi ғел by the sum of separate probabilities of getting 6 heads, 7 
heads and 8 heads. . 
n—8, r=6, 7, 8, p=}, q=} 
P(r=6, 7, 8) 8ce(3)* (3)*--*es (3) (3) -*e (D * Q* 
SCRAP TOME at 
=28 X 2561595256 1X se 256 
(ii) Probability of getting no head is given by 
$ 3,501 
8, = жуз 
со 
(iii) Probability of getting all heads is given by 
8 ) o_ ENT 2 
eG m IX m. 


Illustration 43. If on an average 9 ships out of 10 arrive safely to ports, find 
the mean and standard deviation of ships returning safely out of a total of 500 ships. 


Solution. Probability of safe arrival or p = 5 ='9 
S q=1—p=1—"9="'1 
Mean number of shirs returning safely is given by 
т=пр=500х'9=450 
Standard deviation will be given by : 


Hence the mean and standard deviation of ships returning safely is respectively 
450 and 677. 


Illustration 44. It is given that 3% of electric bulbs manufactured by a company 
are defective, Using the Poisson approximation, find the probability that а sample of 
100 bulbs will contain (i) no defective, (ii) exactly one defective. (B. A., Bombay, 1975) 

Solution, Number of defective bulbs in a sample of 100 is3 (since 3% bulbs 
manufactured are defective). 

Hence m=3 э 

Probability of no defective bulb in a sample of 100 is given Ьу: 

P(0)—e7^ 

=е@-%=0`05 (from the table) 
P(1)=P(0)xm 

=0'05х3=0'15. 

Illustration 45. Suppose that in key punching of 80-column IBM cards, the 
arithmetic mean number of mistakes per card is 03, What per cent of cards will have 
(i) no mistake, (ii) one mistake, and (їйї) two mistakes. 
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Solution. We are given т=0'3 
Probability of 0 mistake, i.¢., no mistake 
-em—271837** 
—Rec. [AL(Log 277183 x0'3)} 
—Rec. [4L(0:4343 x03)] 
-- Rec, [AL 0713029] 
=Rec. 1`35=0°7408 
This means that 74 per cent of the cards punched will have no mistakes. 
P(1)=Poxm 
=0°7408 x0 3 
=0'22224 
This means that about 22 percent of the cards punched will have опе 
mistake. 
П т 
a 
—=022224х-у5 —00333 
Thismeans that about 3 per cent of the cards punched will have two 
mistakes. 


Illustration 46. (a) The components processed by a machine have been found to 
have some defects. 50 components were selected at random and the number of defects 
in each of them noted. The following table gives the information : 


Number of defects in a component : 50 components 


Pa=P, X 


4 1 2 2 1 3 2 4 2 2 
0 1 3 2 4 3 2 1 1 2 
2 3 0 2 1 0 1 2 3 2 
4 0 2 1 5 1 3 5 2 1 
0 2 5 1 3 0 1 3 2 1 


(a) Determine the probability distribution of the random variable: number of 
defects in a component and the frequency distribution based on the Poisson distribu- 
tion. (Given e7—0:1353) 

(b) Verify whether a Poisson distribution can be assumed. Use 7° test." 

Table of x? 


М. 3 1 2 3 4 
5% points оѓ? 384 5:99 782 949 

i (M. Com., Meerut, 1976) 
Solution. (а) CALCULATION OF EXPECTED FREQUENCIES 


x f 
D No, of defects Frequency in 
0 6 " КБИ 
1 13 13 
2 16 32 
3 8 24 
4 4 16 
5 3 15 
N=50 / 5/Х=100 = 
EXE ШЕ у 
тог ааба 
P= 
P(0)—e? 
e=0'1353 (given) 


*For details please refer to Chapter on x? test. 
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N(Po)=50X0'1353=6°76 
N(P1) -N(P9) xm m—676x 2=13'52 


NGS)-NG)x 7-159 x 2.1352 


N(Ps)=N(P2) x 3- =13:52х 2-0 
МР) =М(Р)х $- = 901 x-2 o5 
М(Р)=М(Р)х = —a5x 2 org, 


^ Mlustration 47. Certain mass-produced articles of which 0'5 per cent are defec- 
tive, ate packed in cartons each containing 130 articles. What proporticn of cartons 
are free from defective articles, and what proportion contain 2 or more defectives ? 


(Given e~°5=0'6065) (M. Com., Meerut, 1974) 
Solution, рү) 


Р(0) —e-*5—0'6065 (given) 
P(1)=P(0) хт=0`6065 x 0'5=0'30325 
The probability of a carton with 2 or more defectives 
=1—0°6065—0'30325=0'09025. 
Therefore the proportion of cartons free from defective articles is 60°65% or 
61% and with 2 or more defectives is 9% (approx). 
Illustration 48. The following table gives frequencies of occurrence of a variate 
x between certain limits : 


Variate (x) Frequency (f) 
Less than 40 30 
40 or more but less than 50 33 
50 and more 37 
100 


The distribution is exactly normal. Find the average and standard deviation 
of x. (M. Com., Delhi, 1974) 
Solution. Area between the ordinate 40 and 
X=(50—30)%=20% 
Area between the ordinate 50 and 
X=(50—37)=13% — 


For the proportion of area 0°20, mod =0'5244 
. 5 Ti CERES xe —03318 
ты =0'5244 EO 
and к =0'3318 (й) 
Adding (i) апа (й) 10—0:8562e ог o=11°68 


putting the value of с in equation (i) es 
X—40='5244x 11:68 or X —404-612—4612. 


Illustration 49. Is there any inconsistency in the statement ‘the mean of Bino- 
mial distribution is 80 and standard deviation 8' ? If no inconsistency is found, what 


lues of p, dn? 
HA MU (Degree Prog. in B. Adm. & Commerce, T.U., 1977) 


Solution; The mean of binomial distribution is given by np and standard 
deviation by М/прд. 
SM-A—10 77-52 
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Here np=80 and A/npq = 8 
Since /npd —8 and npq=64 
putting the value of np 
804—64 
6А 
4=80 
Hence there is no inconsistency in the statement since the value of q does not 
exceed one. 


=0'8 


Since q—0'8, p will be 1—0:8—02 
npq=64 
putting the value of p and д, we can find n. 
nx 8x'2-64 


Hence n=400, g=0'8 and p=0'2. 


Illustration 50, B:twe*n the hours of 2 and 4 p.m. the average number of 
phone calls per minute coming into th» switchboard of a company is 2°5. Find the 


probability ihat during one particular minute there will be no phone call at all. 
(Given e-?—0'13534 and e—*—:0760650) (ІС. W.A, 1976) 
Solution: This is a problem of Poisson distribution. 
Р( е" mt 
у= r! 
P(0)=e*"* 


e =0'13534 and e75— 60650 
„te 6752:713534 x 60650— 00821 
Hencethe probability that during one particular minute there will be no phone call 
at all —0'0821. 
SUGGESTED READINGS 
Chou : Statistical Anal ysis 
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Yule & Kendall : An Introduction to the Theory of Statistics 
Wilks : Elementary Statistical Analysis 


3 Sampling and Tests of 
Significance 


‚ш the chapter on sampling, it was pointed out that a sample is not 
studied for its own Sake. Instead, we are interested in sample information 
because of the following two reasons : 


(i) For testing hypothesis. Sampling may be used to test some hy- 
pothesis about parent population from which the sample is drawn. In other 
words, we may compare observation with some hypothesis or theoretical 
expectation and discover how far the difference between the two can 


be attributed to chance, ie, fluctuations of simple sampling. This is 
the problem of significance. 


(ii) For estimation. That is, to use the ‘statistics’ obtained from 
the sample as estimates of the unknown ‘parameters’ of the population 
from which the sample is drawn. This is the problem of estimation. 


However, it is possible to use samples for these purposes, only if the 
samples are random samples. There are various kinds of random samples 
but the basic type is the simple random sample where every element of 


the population is given an equal opportunity of being selected in the 
sample. 


э 


The object of the present chapter is to describe simpler type: 
of significance tests, ie, test of significance of statistical hypo- 
thesis. There can be several types of hypothesis. For example, a coin 
may bethrown 200 times and we may get heads 80 times and tails 120 
times. We may now be interested in testing the hypothesis that the coin 
is unbiased. To take another example we may study the average 
weight of the 100 students of a particular college and may get the result 
as 110 lb. We may now be interested in testing the hypothesis that the 
sample has been drawn from a population with average weight 
115 lb. Similarly we may be interested in testing the hypothesis that the 


variables in the population are uncorrelated. The procedure adopted. 
in testing a hypothesis is as follows : 


1, Set up a hypothesis. A hypothesis is a supposition made as 
a basis for reasoning. АП scientific theories are tested for setting up a. 
hypothesis against data of observation. If observed facts are clearly 
inconsistent with a given hypothesis, it must be rejected. If the facts are not 
inconsistent with the hypothesis, the hypothesis is tenable. The test of 
significance may best be oriented logically around the satement of a hypo- 
thesis with which the data may or may not conform. A statistical hypothesis 
is a hypothesis concerning the parameters or form of the probability dis- 
tribution for a particular population. Various kinds of hypothesis are 
possible. For example, one value is larger to another, that one value 
is smaller to another, or that the two values are not equal. 


*Palmer О. Johnson has beautifully described hypothesis as “islands in the 
uncharted seas of thought to be used as bases for consolidation and recuperation as we 
advance into the unknown.” 
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The conventional approach to hypothesis testing is not to construct 
a single hypothesis about the population parameter, but rather to set up 
two different hypotheses. "These hypotheses must be constructed so that 
if one hypothesis is true, the other is false and vice versa. 


The two hypotheses in a statistical test are normally referred to as = 
(i) Null hypothesis, and 
(ii) Alternative hypothesis. 


The null hypothesis is a very useful tool in testing the significance 

of difference. In its simplest form the hypothesis asserts that there is no 

e ‚ true difference in the sample and the population in the particular matter 

„б \ under consideration (hence the word “null” which means invalid, void 

pud oramounting to nothing) and that the difference found is accidental, 

unimportant arising out of fluctuations of sampling. The null hypothesis 

is akin to the legal principle that a man is innocent until he is proved 

guilty. It constitutes а challenge; and the function of the experiment 

isto give the facts a chance to refute (or fail to refute) this challenge. 

For example, if we want to find out whether extra coaching has benefited 

the students or not, we shall set up a null hypothesis that “extra coaching 

has not benefited the students". Similarly, if we want to find out whether 

a particular drug is effective in curing malaria we will take the null 
hypothesis that “the drug is not effective in curing malaria". The 


drug. rejection 
of the null hypothesis indicates that the differences have statistical signi- 


ficance and the acceptance of the null hypothesis indicates that the diffe- 
Fences are due to chatice. Since many practical problems aim at establish- 


ing Statistical significance of differences, rejection of the null hypothesis 
may thus indicate succegs in statistical project, 


As against the null hypothesis, the alternative hypothesis specifies 
those values that the researcher believes to hold true, and, of course, he 


hopes that the sample data lead to acceptance of this hypothesis as 
true. The alternative 


hyphothesis may embrace the whole range 
of values rather than single point. Only one alternative hypo- 
thesis can be tested at one time against the null hypothesis. Now-a- 
days, it is usually accepted common practice not to associate any 
special meaning to the null or alternative hypothesis but merely to let 
these terms represent two different assumptions about the population 
parameter. However, for statistical convenience it will make a difference 
as to which hypothesis is called the null hypothesis and which is called 
the alternative. 


The null and alternative h 


1 ypotheses are distinguished by the use of 
two different symbols, H, representing the null hypothesis and H, the 


alternative hypothesis. Thus a psychologist who wishes to test whether 
ornota certain class of people have a mea 


d 5 in J.Q. higher than 100 might 
establish the following null and alternative e 


H, : н==100 (null hypothesis) 


Ha : 7-100 (alternative hypothesis) on 


Or, if he is interested in testing the differenes between the mean I.Q. of two 
groups, this psychologist might want to establish the null hypothesis that 
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the two groups have equal means (u,—9,—0) and the alternative 
hypothesis that their means are not equal (и —u,750) 


Н, : v4 —94—0 (null hypothesis) 
H, : ш — 0550 (alternative hypothesis) 


2. Set up a suitable significance level. Having set up the 
hypothesis the next step is to test the validity of H, against that of Ж, at 
certain level of significance. The confidence with which an experimenter 
rejects—or retains—a null hypothesis depends upon the significance level 
adopted. The significance level is customarily expressed as a percentage, 
such as 5 per cent, 1 per cent and the like. A level of significance of, 
say, 5 per cent, is the probability of rejecting the null hypothesis if it is 


j When the hypothesis in question is accepted at the 5 per cent level, 
the statistician is running the risk that, in the long run, he will be making 
the wrong decision about 5 per cent of the time. By rejecting the 
hypothesis at the same level he runs the risk of rejecting a true hypothesis 
in 5 out of every 100 occasions. By testing at the 1 per cent level he seeks 
to'reduce the chance of making a false judgment but some element of risk 
remains (1 out of 100 occasions) that he will make the wrong decision, i.e., 
he may accept where he ought to have rejected or vice versa. 


3. Setting a test criterion. The third step in general testing procedure 
isto constructa test criterion. This involves selecting an appropriate 
probability distribution for the particular test, that is, a probability distri- 
bution which can properly be applied. Some probability distributions 
that are commonly used in testing procedures are t, F and y?* Test 
criteria must employ an appropriate probability distribution ; for 
example, if only small sample information is available, the use of the 
normal distribution would be inappropriate. 

4. Doing computations, Having taken the first three steps, we 
have completely designed a statistical test. We now proceed to the 
fourth step—performance of various computations—from a random 
sample of size л, necessary for the test. These calculations include the 
testing statistic and the standard error of the testing statistic. 


5. Making decisions. Finally, as a fifth step, we may draw 
statistical conclusions and may make decisions. A statistical conclusion or 
statistical decision is a decision either to reject or not to reject the null 
hypothesis. The decision will depend on whether the computed value of 
the test criterion falls in the region of rejection or the region of 
acceptance. If the hypothesis is being tested at 5% level and the observed 
set of results has probabilities less than 5 per cent, we consider the 
difference between the sample statistics and the hypothetical parameter 
significant. In other words, we think that the sample result is so rare that 
it cannot be explained by chance variation alone. We then decide to reject 
Hy, and state : “the null hypothesis is false", or “the sample observations 
are not consistent with the null hypothesis" (the rejection of H, automati- 
cally leads to acceptance of Ha). 


On the other hand, if at 5% level of significance the observed set of 
results has probability more than 5 per cent we give reason that the 
difference between the sample result and the hypothetical parameter can be 
explained by chance variations and, therefore, is not significant statistically. 
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Consequently, we decide not to reject H, and state: “Тһе sample 
result is not inconsistent with the result hypothesis." If the probability is 


about 5 per cent, the wisest course may be to resolve judgment and draw 
another sample, if possible. 


The reader might have noted above that the rejection Statement is 
much stronger than the acceptance statement. In other words, if the null 
hypothesis is not rejected, the statistician does not then categorically 
conclude that the hypothesis is true. The difference in attitudes arises 
essentially from the fact that, in logic, it is always easier to prove some- 
thing false than to prove it true. 


It should be clearly noted that the practical “managerial decision” 
is outside the responsibility of the statistician. He does not make the 
decision ; he purely provides information on the basis of which the 
businessman or administrator can be assisted in making his decisions. 


Two Types of Errors in Testing of Hypothesis 


When a statistical hypothesis is tested there are four possible results : 
l. The hypothesis is true but our test rejects it. * 

2. The hypothesis is false but our test accepts it. 

3. The hypothesis is true and our test accepts it. 

4. The hypothesis is false and our test rejects it. l 


Obviously, the first two possibilities lead to errors. When the result 
of the test leads to the rejection of a null hypothesis which is true 
(possibility No. 1) this is called Type I error. 


In other words, type I errors are 
thesis by making a difference 
exists. On the other hand, if the 
acceptance of the null hypothesis 
error. In other words, type II 
hypothesis by making a difference 


made when we reject a null hypo- 
significant, although no true difference 
result of a statistical test leads to the 
which is not true this is called type П 
errors аге made when we accept a null 


А а Guter not significant, when a true difference | 
actually exists. The distinction between these two types of errors can be | 


made clear by an example. Assume that the difference between two 
population means is actually zero. If our test of significance when 
applied to the two sample means leads us to believe that the difference in 
population means is significant, we make a typel error. On the other 
hand, suppose there is true difference between the two population means. 
Now if our test of significance leads 


| to the judgment “пої significant", 
we commit a type II error. We thus find ourselves in the situation which 
is described by the following table : 


AcceptH Reject H 


| 
E Correct | TypeI 
His true. | decision | error 
| I 
7 ype II | Correct 
H is false error | decision | 


1 
Lc ZU D 


SAMPLING AND TESTS OF SIGNIFICANCE А-3°5 


In testing a hypothesis, we first consider how much risk we are willing 
to accept. This means determining the size of our rejection region or level 
of significance. Statisticians have traditionally used Greek Letter alpha, « 
(level of significance, or size of rejection region) equal to 0°05 for most 
statistical tests. This means that a true hypothesis will be accepted 95 per 
cent of the time. This leaves a 5 per cent chance that we will commit an 
error of Type I, reject a true hypothesis. The size of the acceptance region 
on each side of the mean is 0:475 (or moret échnically correct, a probability 
of 47°5 per cent) and the size of the rejection region is 0 025. If we consult 
the table of areas under the normal curve (appendix 5, Table УШ), 
we find that an area of ‘4750 corresponds to 1°96 standard errors on each 
side of uz, the hypothetical mean, and this equals the size of the accept- 
ance region. If the sample mean falls into this area, the hypothesis is 
accepted. If the sample mean falls into the area beyond 1°96 standard 
error, the hypothesis is rejected because it falls into the rejection region. 
The acceptance and rejection regions for testing hypothesis, at the ‘05 
level of significance are given below for a two-tail test. 


ACCEPTANCE AND REJECTION REGIONS(@= 0-05) 


ACCEPTANCE REGION 
| 


[REJECTION REJECTION 

| REGION | REGION 

| | Fi | Он 

E A | WEN е — [022 
m oF Hy Hy * 60 


It would be clear from the diagram that їп а two-tail test rejection 
regions are located in both tails. 


It is possible for a true hypothesis to fall into the rejection region and 
for a false hypothesis to fall into the acceptance region. At the 0°05 level of 
significance the probability is 5 per cent that a true hypothesis will fall into 
the rejection region. This rejection of true hypothesis is called error of 
Type I. 

Suppose we want to reduce the risk of committing an error of Type 
I. This is done by reducing the size of the rejection region. For this a 
hypothesis may be treated at the "01 level of significance which means that 
the probability of rejecting a true hypothesis is 1 per cent. It we consult 
the table of areas under the normal curve (appendix 5, table VIII) we find 
that an acceptance region of “495 (one half of '99) is equal to 2°58 
standard errors from ug. The acceptance and rejection regions at 0°01 
level of significance are given in the following figure : 
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ACCEPTANCE AND REJECTION REGIONS (a= 0-01) 


ACCEPTANCE REGION 


REJECTION 
REGION 


| 
005 4 I 
Ду-2 588; HE, yt 2580 


It will be clear from the above figure that as we decrease the size оѓ 
rejection region, we increase the probability of accepting our hypothesis. 
At a=0'01 the probability of rejecting a true hypothesis is 1 per cent. 


other than random factors. This increases the probability of accepting a 
null hypothesis when it is false. 


crease the sample size. OF course, neither error can ever be completely 
eliminated. In practice, generally, p prefer to use 0°05 level of signifi- 


The test procedure given above is called a two-tail test because the 
rejection region is located in both tails. We look for a value that is close 
on either side of the hypothetical mean and we reject the null hypothesis 
if the sample mean is either too small or too large. 


hypothesis formulated. For example, if we are interested in testing a 
hypothesis that the average income per household is greater than 
Rs. 1,000, we will place all the alpha risk on the right side of one 
theoretical sampling distribution and the test will be one-sided right test. 
On the other hand, if we are testing the hypothesis that the average 
income per household is Rs. 1,000 or less, the alpha risk is on the left side 


= 
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of our theoretical sampling distribution and the test will be one-sided left 
test... The following two figures illustrate one-tail right and one-tail left 


tests : 


ACCEPTANCE AND REJEQTION REGIONS(ONE-TAIL TEST, RIGHT &-0-05) 


"ACCEPTANCE REGION: | 


| 
0" 
L- 
ACCEPTANCE AND REJECTION REGIONS (ONE-TAIL TEST, LEFT. 40-05) 
+ ACCEPTANCE REGION 


REJECTION 
REGION 


Standard Error and Sampling Distribution 


Before discussing the various types of tests of significance let us 
acquaint ourselves with the concept of standard error which is of funda- 
mental importance in testing hypothesis. 


The standard deviation of the sampling distribution is called the stan- 
dard error.* It is so called because it measures the sampling variability 
due to chance or random forces. Hence to clarify the term standard 
error it is necessary to describe a sampling distribution. If we select a 
number of independent random samples of a definite size from a given 
population and calculate some statistic (like the mean, standard deviation, 
etc.) from each sample, we shall get a series of values of these statisties or 
functions. These values obtained from the different samples can be put 
in the form of a frequency distribution. The, distribution so formed of 
all possible values of a statistic is called the sampling distribution or the 
probability distribution of that statistic. Thus if we draw 100 random sam- 
ples from a given population and calculate their means, we shall get a 
series of 100 means which would form a frequency distribution. This dis- 
tribution will be known as the sampling distribution of the means. 


Thefollowing diagram gives the sampling distribution of sample ` 
means based on a large number of sample means : 


tion is not normal, then the sampling distribution of 


* If the universe distribui _the 
ality as the sample size increases. 


sample means approaches norm: 
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SAMPLING DISTRIBUTION 


In a similar manner we can talk of the sampling distributions of 
standard deviation coefficient of correlation and other statistical measures. 
The sampling distribution of a statistic reveals some important features : 


l. First, a sampling distribution is Benerated from a population 
distribution, known or assumed. 


2. Secondly, the same population may generate an infinite number 
of sampling distributions for the statistic, each for special sample size n. 


3, Finally, a population may generate sampling distributions for 
two or more different statistics, 


The concept of standard error is of great significance in statistical 
work because of the following reasons : 


and expected results is less than 1°96 S.E., it is not regarded as significant, 
t.e., it could have arisen dueto fluctuati 


the hypothesis. If the difference is more than 2:58 S.E., it is considered 
to be significant at 1% level. In Practice quite often a hypothesis is 
accepted if the difference is less than 3 S.E., because the probability of a 
difference greater than 3 S.E., arising by chance, is only about 3 in 
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thousand (0:27)9/ as 99°73 per cent items are covered between mean +36 
on either side of the mean. However, it must be emphasised that the use 
of Зв rule is justified only if 7 is large. Some people apply a criterion of 
2 SE. in order to determine whether or not the difference could have 
arisen due to fluctuations of sampling. However, instead of 3 S.E., or 2 S.E, 
it is suggested that we should use either 5% level or 1%, level of signifi- 
cance. (In practice 5% level is more popular.) 

(ii) Standard error provides an idea about the unreliability of a 
sample. The greater the standard error, the greater is the departure of 
actual frequencies from the expected ones and hence the greater the 


unreliability of the sample. The reciprocal of S.E., i.e., БЕУ is a 


measure of reliability or precision of the sample. The reliability or 
precision of an observed proportion varies as the square root of the 
number of items in the sample. In other words, if we want to double the 
precision (which is the same thing as reducing the standard error to one- 
half) the number of observations should be increased four times. 

(iii) With the help of S.E. we can determine the limits within which 
the parameter values are expected to lie. This is made possible because 
for large samples, sampling distributions tend to approximate a normal 
distribution. In a normal distribution 68279, of the samples will have 
their mean values (or any other constant) within a range of the population 
mean + 1 standard deviation or standard error as it is alternatively called. 
Similarly a range of mean +2 S.E. will give 95°45 per cent values and mean 
+ 3 S.E. will give 99°73 per cent values. Thus a range of + 3 S.E. should 
be taken as the determining limit outside which the value of the parameter 
probably does not fall. The chance of a value lying outside + 3 S.E. limits 
is only 0'27% (i.e., approximately 3 in 1,000). 

Estimation 

Estimation allows us to induce universe values on the basis of 
sample values and make decisions based on the population estimates. In 
other words, estimation enables us to ascertain the value of the unknown 
parameter 0... With respect to estimating a parameter, there are two types 
of estimates to be considered : 

(1) Point estimates, and 

(2) Interval estimates. 

Point estimates. The procedure in point estimation is to select a 
random sample of n observations, х1, х3..:, Xn from а population f (x ; 0). 
and then to use some preconceived method to arrive from these obser- 


vations, at a number say 6 (read theta hat), which we accept as an 


estimator of Ө. The estimator 6 is a single point on the real number scale 
and thus the name point estimation. 


Interval estimates. Intervul estimation refers to the estimation of a 
parameter by a random interval, called the confidence interval, whose 
endpoints, |. and U with | <U, are functions of the observed random 
variables such that the probability that the inequality | <0<\) is satis- 
fied in terms of a predetermined number, 1—9. L- and U are called- the 
confidence limits and are the random endpoints of a random interval. 
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On comparing these two methods of estimation we find that point 
estimation has an advantage inasmuch at it Provides an exact value for 


Properties of a Good Estimator 


A good estimator, as common Sense dictates, is close to the para- 
Meter being estimated. Its quality is to be evaluated in terms of the 
following properties : 

1. Unbiasedness, Ап estimator is said to be unbiased if its expected 
value is identical with the Population parameter being estimated. That is 


if 0 is an unbiased estimate of 0, then we must have E (8)—0. Many 
estimators are "Asymptotically unbiased" in the sense that the biases 
reduce to practically insignificant values when 7 becomes sufficiently large, 
The estimator 5? is an example, 

It should be noted that bias in estimation is not necessarily undesira- 
ble. It may turn out to be asset in some situations, For example, it 


2. Consistency. An estimator 6 is said to be consistent for 0 if the 


limit of the probability that [0—0] <8 is unity when the sample 
size approaches infinity. In other words, as estimator is said to be con- 
sistent if the probability for it to approach the parameter being estimated 
is 1 as п approaches infinity. 

The sample mean is an unbiased estimator of ш no matter what 
form the population distribution assumes, while the sample median is an 
unbiased estimate of t only if the Population distribution is symmetrical. 
The sample mean is better than the Sample median as an estimate of шіп 
terms of both unbiasedness and consistency. 


In case of large Samples consistency is a desirable property for an 
estimator to possess. However, in small samples, consistency is of little 
importance unless the limit of probability defining consistency is reached 
even with a relatively small size of the sample, 


3. Efficiency. Given that estimators are unbiased or at least consis- 


ent, then an estimator б, issaidto be more efficient than another esti- 


~ 
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If the population is symmetrically distributed, then both the sample 
mean and the sample median are consistent and unbiased estimators of џи, 
Yet the sample mean is better than the sample median as an estimator of 
и. This claim is made in terms of efficiency. 

4. Sufficiency. An estimator is said to be sufficient if it conveys as. 
much information as is possible about the parameter which is contained in 
the sample, so that little additional information is supplied by any other 
estimator. The significance of sufficiency lies in the fact that if a sufficient 
estimator exists, it is absolutely unnecessary to consider any other non- 
sufficient estimators ; a sufficient estimator ensures that all information 
that a sample can furnish with respect to the estimation of a parameter is 
being utilised. 

Many methods have been devised for estimating functions that may 
provide estimators satisfying these properties. The two important methods 
are the least square method and the maximum likelihood function. 

Having discussed the above concepts let us now discuss the various 
situations where we have to apply the various tests of significance. For 
the sake of convenience and clarity these situations may be summed up: 
under the following three heads : 

I. Sampling of Attributes. 

П. Sampling of Variables (Large Samples). 

III. Sampling of Variables (Small Samples). 

L SAMPLING OF ATTRIBUTES 

As distinguished from variables where quantitative measurement of 
a phenomenon is possible, in case of attributes we can only find out the 
presence or absence of a particular characteristic. The sampling of 
attributes may, therefore, be regarded as the drawing of samples from a 
population whose members possess the attribute A or not A, For 
example, in the study of attribute ‘Literacy’ a sample may be taken and 
people classified as literates and illiterates. With such data the binomial 
type of problem may be formed. The selection of an individual on 
sampling may be called ‘event’, the appearance of an attribute A may be 
taken as ‘success’ and its non-appearance as ‘failure’. Thus if out of 1,000 
people selected for the sample, 100 are found literates, and 900 illiterates,. 
we would say that the sample consists of 1,000 events out of which 100 are 
successes and 900 failures. The probability of success or p=100/1,000 or 
0'1 and the probability of failure or g=900/1,000=0°9 so that p+q=0'1 
+0°9=1. 

The sampling distribution of the number of successes, being a 
binomial probability model, would have as its mean д=лр and as its 
variance c*—npq or c= v/npq. 

Standard Error of Number of Successes 


If we have N samples with п events in each, the chance of success 
in each event is p and of its failure q, i.&., (1—p). The standard error of 
the successes is given by the formula : 


S.E. of number of successes— 4/npq. 


Illustration 1. In 324 throws of a six-faced die odd points appeared 181 times. 
Would you say that the die is "fair" ? State carefully the property on which you base 
you rcaiculation. 
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Solution, Let us take the hypothesis that the die is fair. On the basis of the 
hypothesis probability of getting odd points and even points should be the same, 
de., $. Hence the expected number of odd points in 324 throws=324x }=162. The 
deviation of the actual from expected number of odd points (181—162) =19. 
S.E. of number of odd points =+/ пра 
n=324, p=}, q=} 
SE = у 3Axixi —9. 


Difference MI -21 
SIB. "oT T2 guts 


Since the difference between observed and expected numbers of odd points is 
less than 2°58 S.E., it is not significant at 1 percent level. However, at 5% level since 
the difference exceeds 1°95 our conclusion will be reversed. In other words, at l per 
«ent level our conclusion is that the die is fair but at 5 percent level the conclusion is 
that the die is not fair. f 

Illustration 2. 160 heads and 240 tails were obtained in tossing a coin 400 times. 
Find a 95 per cent confidence interval for the probability of a head. Does this appear to 
bea true coin ? 

Solution, Let us take the hypothesis that the coin is unbiased. Ол the basis of 
this hypothesis the chance of getting a head in a toss is 3 апа, therefore, the expected 
number of heads їп 400 tosses=400x4=200. The observed number of heads is 160. 
The deviation of the actual number of heads from expected —40, i.e., (200—160). 


S.E. of the number of heads = 4/npq. 


n=400, p=4, q=4 
S.E.=V/400X$%4 —10 


Difference i50 me 
SB. 10 y 
Since the difference between the observed and expected number of heads is more 


than 258 S.E. (1% level of significance), the result of the experiment does not support 
the hypothesis that the coin is unbiased. Hence the coin does not appear to be true. 


Illustration 3. In a sample of 500 People from Andhra Pradesh, 280 are found 
to be rice eaters and the rest wheat eaters, Can we assume that both the food articles 
are equally popular ? 


Solution. Let us take the hypothesis that the food articles are equally popular. 
The expected frequency for wheat eaters and rice eaters would be 20 7-250 each. 


Difference 30 e 
Sh nares ne 


Illustration 4. In a hospital 480 female and 520 male babies were born in a 


week. Do these figures confirm the hypothesis that males and females are born in 
equal number ? 


, Solution. Let us take the hypothesis that the male and female babies are 
born in equal number, ie., p=q=} 


S.E.= V/npq— \/1000XFx4=15'81 
Difference between observed and expected number of female babics=520—500=20 


Difference 20...20 
&E. 1581—1265. 


Since the difference is less than 1°96 S.E, (5% level) itcan be concluded that the male 
and female babies are born in equal number, 


Standard Error of the Proportion of Successes 
Instead of recording the number of Successes in each sample, we 


1 
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A i ion 
might record the proportion of successes, that is, uw of the number in 


each sample. As this would amount to dividing all figures of the record 
by n, the mean proportion of successes must be p, and the standard devi- 


ation of the proportion of successes A / 77.. Thus we have the 
n 


following formula : 


Illustration 5, 500 apples ате taken at random from a large basket and 50 are 
found to be bad. Estimate the proportion of bad apples in the basket and assign 


Aimits within which the percentage most probably lies. (М.А. Econ., Punjab, 1973) 
Solution. The proportion of bad apples in the given sample 
SUN 
Son c 


Hence р=0`1 and g=0°9 
1х0" 0:09 
sio E Oso 720070913 


The limits whithin which percentage of bad apples lies 


- [23 [©] x100 


—[0712-3(07013)] x 100 
=[0°1+ (0039)] x 100 
=[0'1—0'039] x 100 
—6'1 and 13°9, 
Thus the percentage of bad apples in consignment almost certainly lies between 
6'1 and 13'9. 
Illustration 6, A sample of 100 days is taken from meteorological records of a 
certain district and of them 10 are found to be foggy. What are the probable limits of 
the percentage of foggy days in the district ? 


Solution. The proportion, p, of foggy days in the sample, i.e., 


S.E. of the proportion of foggy days= 


асаа 
10 х 10 Х 500 
=0`0134=1`34 per cent 
Probable limits of the percentage of foggy days in the district 
=p+3 S.E: 
=10+3(1'34) 
=10+402 
=5'98 to 1402 or 6 fo 14. 


Standard Error of the Difference between the Proportions 

If two samples are drawn from different populations, we may be 
interested in finding out whether the difference between the proportion 
of successes is significant or not. In such a case we take the hypothesis 


E 
n 
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that the difference between рү, i.e, the proportion of successes in one 
sample, and pz, i.e., the proportion of successes in another sample, is due to 
fluctuations of random sampling. The standard error of the difference 
between proportions is calculated by applying the following formula : 


Tey Ja 
SE. —р)= pdt x) 
where p=the best estimate of the actual proportion in the population. 
The value of p is obtained as follows : 
MP tMePe 
Ny +N 
If 71 P5. is tess than 1°96 SE. (5% level of significance), the diffe- 
нада is regarded as due to random sampling variation, i.e., as not signi- 
cant. 
The following illustration shall explain the method : 


Illustration 7. In a random sample of 1,000 persons from town А, 400 are 
found to be consumers óf wheat. Ina sample of 800 from town B, 400 are found to be 
consumers of wheat. Do these data reveal a significant difference between town A and 
town B, so far as the proportion of wheat consumers is concerned ? 
© (M. A., Econ., Punjab, 1975) 


Solution, Let us take the hypothesis that the two towns do not differ so far as 
proportion of wheat consumers is concerned. 


Computing the standard error of the difference of proportions 


S É-(p,—p4) = fr 


nı=1000, p1— rd =04 


п,=800, DE EL 
_ (1000 x 04)-- (800 x 05) As 


p= 


ia 1000+800 9 
Lid 
ТТА 


Ld: US] TA 
SE. (pj д) = -5 Сэ ws р) 


| 20 9 Н 
m —— 7-002: 
SMS 4000 М 


Pi—P2=0'4—0'5=—0'1* 
Difference 01 е 
SE ~ 0029 7417. 


Since the difference is more than 2°58 S.E. (1% level of significance) it could not 
have arisen due to fluctuations of sampling. Hence the data reveal a significant differ- 


ence between town А and town В so far as the proportion of wheat consumers is 
concerned. 


Illustration 8. In a random sample of 500 persons from Maharashtra, 200 are 
found to be consumers of vegetable oil. In another sample of 400 persons from Gujarat, 
200 are found to be consumers of vegetable oil. Discuss whether the data reveal a 
significant difference between Maharashtra and Gujaratso faras the proportion of 
vegetable oil consumers is concerned. (M. Com., Delhi, 1973) 


. *P1—P2=0'4—0'5=—0'l. However, the conclusions remain the same irres- 
pective of the fact whether the difference is positive or negative. 
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Solution, Let us take the hypothesis that there is no difference between 
Maharashtra and Gujarat so far as the proportion of vegetable oil consumers is con- 
cerned, i.e., p;—ps (null hypothesis). 


Computing the Standard Error of the difference of proportions 
Р EEY 
SE. = A G -L 
(р-р) N Р n m 
— Tapid-reps 


р Ny+Ng 
9=(1—р) 


m=500, p,— = 04 


"27400, p 9-055 


ps (500x04)-- (400 05) 
5004-400 


SE. AW Sig Зуу 
(p1—p2) 9 X 9 V 500 * 400 
e En) 9 
J 817 2000 
P1i—P2=0'4—0'5=0'1 
Difference — 01 _,, 
SE i а 
Since the difference is more than 2'58 S.E. at 1% level of significance our 


hypothesis does not hold good. Hence thereis significant difference between Mabara- 
shtra and Gujarat so far as the proportion of vegetable oi) consumersis concerned. 


Illustration 9, In a random sample of 500 persons in the city X, 300 are found 
to be consumers of tea, In another sample of 1,000 people from the city Y, 550 are 
consuming tea. Do the data revea/ a significant difference between X and Y so far аз 
the habit of taking tea is concerned ? 

Solution, Let us take the hypothesis that the two cities X and Y do not differ 
80 far as the habit of taking tea is concerned. 


Computing S.E. of the difference of proportions 


M EIC TS ERE E: 
LM em ng 
300 


= ү0011=:033 


m=500, pj— 60—060 
п,=1000 а= 50, 055 
(500 x0°6) + (1000 х0`55) AT: 
5002-100 5 30 
ES 17 283 
а=1— ^30 30 


en г) 
SE. -/ 30 * 30 500 * 1000 
(P1—Pa) 


KER EE н 
e itte жез 0027 
Ti 30 30 1000 


8M-A—11:77-53 
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P1—P3—060—0:55—0:05 
Difference _ 0'05 1:85 
S.E. 0:027 f 
Since the difference is less than 1°96 S.E, (5% level of significance) it could have 
arisen due to sampling fluctuations. The data do not reveal a significant difference 
between city X and Y so far as the habit of taking tea is concerned. 
Illustration 10. A machine puts out 10 imperfect аг sin a sample of 200, 
After the machine is overhauled it puts out 4 imperfect articles in а batch of 100. Has 
the machine been improved? 


Solution, Let us taX» the hypothesis that the machine has not been improved. 
P1=proportion of imperfect articles in the first sample 


10 
= =0" 
200 ШЗ 
Ps=proportion of imperfect articles in the second sample 
4 " 
—^100 =0°04 


Pi—P2=0'05S—0'04=0'01 


= Ti pis Pa 


nn, 
(200x0705)--(100X0:04) _ 14 
i 200--100 300 
14 286 


4-1—3090 77300 
S.E. of the difference between the proportions 


"Wo (ata) A 300 Е Capo * i) 


= 00258. 
Difference _ 001 _ 
SE. 7700258 0395. 


Since the difference in proportions is less than 1796 S.E, (5% level) our hypo- 
thesis is true, Że., the machine has not improved. 


Tllustration LL. 50 articles from a factory are examined and found to be 2 per 

cent defective. 800 similar articles from a second faztory are found to have only 1'5 

er cent defectives. Can it reasonably be concluded that the products of the first 
factory are inferior to those of the second ? 


Solution. p,—proportioa of defectives in the first factory =0'020. 
P= Э » » » Second » =0°015 
P1—P2=0'020—0'015=0'005=0'S per cent 
Let us take the hypothesis that the produzts of the two factories are similar, 
p= TDi eps 
Tin, 
(500x002) 4- (80x 0/015) 
Wi 500--800 
2 
= о? =0°017 or 17 per cent 
4 —100—p—98'3 per cent 
S.E. of difference of proportions 


E irr 
5 | VES ) 
X983 (500 o0 


=0'737 per cent 
—0'68. 


Difference _ 
SE. 0737 
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Since the difference in the percentage of defective articles is less th M 
(575 lével) we conclude that the products of the two factories are similar. ES 
At times we may be. interested in comparing the proportion of 
persons possessing an attribute in a sample with the proportion given by 
the population. In such a case the following formula is applicable : 


Ny 
SHEER P2)= Poo X (Ету) 
where Ps— Population proportion 
40=1—ро 
пу Number of observations іп the sample 
п-п —Size of population 
п — (Size of population—7) 
Illustration 12, There are 1,000 students in a college. Out of 20,000 in the whole 


university, in a study 200 were found smokers in the college and 1,000 in the whole 
university. Is there a significant difference between the proportion of smokers in the 


college and university ? 
Solution, Let us take the hypothesis that there is no significant difference in the 
proportion of smokers in the college and university. Proportion of smokers in the: 


college 


Difference between the two proportions 
=0'2—0°05=015 
Po, i.e., proportion of smokers in the university =0°05 
Go=1—po=0°95, n1—1000, n;-4-1,— 20000 


So 
S.E. of differenze f Dodo X^, mi ns) 
MAC CAN 19000 
-4 005x0'95( a5 20000 
= /0'000045=0'0067 
Differenc 015 . 
I 00057 7? 39. 
Sinze the difference is more than 2°53 S.E, (1% level of significance) it could 
not have arisen due to fluctuations of sampling. Hence there isa significant difference 
etween the proportion of smokers in the college and university—the proportion being 
significantly higher in the college. 
IL SAMPLING OF VARIABLES j(LARGE SAMPLES) 

Having discussed the problems relating to sampling of attributes in 
the previous section, we now come to the problems of sampling of variables 
such as height, weight, etc., which may take any value. It shall not, there- 
fore, be possible for us to classify each member of à sample under one of 
two heads, success or failure. The values of the variable given by different 
triais will be spread over a range, which will be unlimited—limited by 
practical considerations, as in the case of weight of people or limited by 
theoretical considerations as in the case of correlation coefficient which 


cannot lie outside the range +1 to —1. 


A-3118 SAMPLING AND TESTS OF SIGNIFICANCE 


As in case of attributes there are three objects in studying problems 
relating to sampling of variables : 

(i) To compare observation with expectation and to see how far thé 
deviation of one from the other can be attributed to fluctuations of 
sampling ; 

(ii) To estimate from samples some characteristic of the parent 
population, such as the mean of a variable ; and 


(iii) To gauge the reliability of our estimates. 
Difference between Small and Large Samples 


In this section we shall be studying problems relating to large 
samples only. Though it is difficult to draw a clear-cut line of demarcation 
between large and small samples, it is normally agreed amongst statisticians 
that a sample is to be recorded as large only if its size exceeds 30. The 
tests of significance used for dealing with problems relating to large 
samples are different from the ones used for small samples for the reason 
that the assumptions that we make in case of large samples do not hold 
good for small samples. The assumptions made while dealing with 
problems relating to large samples are : 

(i) The random sampling distribution of a statistic is approximately 
normal ; and 

(ii) Values given by the samples are sufficiently close to the 
population value and can be used in its place for the standard error of the 
estimate. 

While testing the significance of a statistic in case of large samples, 
the concept of standard error discussed earlier is used. The following is 
а list of the formulae for obtaining standard error for different statistics : 


1. Standard Error of Mean* 
(i) When standard deviation of the population is known 
007 
SE ZVA 
where S.E. X refers to the standard error of the mean 


c,—Standard deviation of the population 
N=Number of observations in the sample. 


(ii) When standard deviation of population is not known, we have to 
use standard deviation of the sample in calculating standard error of mean. 
Consequently the formula for calculating standard error is 


. 6 (sample) 
S.E. S ARES CE 
where c denotes standard deviation of the sample. 
It should be noted that if standard deviations of both sample as well 


as population are available then we must prefer standard deviation of the 
population for calculating standard error of mean. 


* The standard error of the mean measures only sampling errors. Sampling 
errors are errors involved in estimating a population parameter from a sample instead of 
including all the essential information in the population. 
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Fiducial limits of population mean : 
95% fiducial limits of population mean are 


eyes 
Х-+1'96 VN 


99% fiducial limits of population mean are 


16. 


S.E. of Mean Deviation or S.E. мр 96028 VN 


y 
S.E. of Coefficient of Variation or S.E,— AM lio 


X42:58 VN 


S.E. of Median {or]S.E. ,,,,j—125331 VE 
S.E. of Quartiles or S.E.,—1:36263 VN 


S.E. of Quartile Deviation or S.E. 9)=0'78672 ZW 


с 


vts с 
S.E. of Standard Deviation or SE. = AN 


i агу 28 
S.E. of Variance or S.E. 2 =o "E 


2y? 


S.E. of Coefficient of Skewness or зва э 


1—/? 


S.E. of Coefficient of Correlation or S.E.,— VN 
S.E. of Regression Coefficient of y on x, i.e., S.E. by; 


с vl- 


Szy N 


S.E. of Regression Coefficient of x on у, i.e., S.E. bev 


SV 1 


Syy N 


S.E. of Regression Estimate of X on Y or S.E.,,—9: V 1-r 
S.E. of Regression Estimate of Y on X or S.E.,,—o, V 1—r* 
S.E. of Coefficient of Association, i.e., S.E.Q 


арау eae eer cj 
2 \ (45) UD GB!) 


S.E. of Rank Correlation Coefficient, i.e., S.E.,; 


1 
VN=1 


The following examples will illustrate how standard error of some of 
the statistics is calculated : 
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* Illustration 13. Under what assumptions would you test the significance of 
uu inlarge samples? Calculate the standard error of the mean from the following 
аа: 


Wages per week No. of persons 
(in Rs.) 
Up to 10 50 
» » 20 150 
» »30 300 
» » 40 500 
»» 700 
» o» 800 
hls uU. 900 
КЛ әк 30. 1,000 
(В. Com., Karnataka, 1973) 
5 у ED MEE 
olution. S.E. X UN 
CALCULATION OF STANDARD DEVIATION 
Wages per week No. of m m-—35 fd! EN 
(Rs.) persons ( 10 
d' 
ыы. ES 
0—10 50 s E —150 450 
10—20 100 15 —2 —200 400 
20—30 150 25 f —150 150 
30—40 200 35 0 0 0 
40—50 200 45 1 200 200 
50—60 100 55 2 200 400 
60—70 100 65 3 300 900 
70—80 100 75 + 400 1,600 
LUE I ae re E OA =e = 
N=1,000 Efd'=600 Bfd'*=4.100 
i J xe 
i al N N 
=,[-4100 ( воо үз 
1099 — ( 3000) *10 


=y 41—036x10 
=1°934 x 10=19°34 


19°34 1934 
SE. >= 71000 = 3162 9612 


Illustration 14, The mean height obtai 
randomly from a population is 64 inches, пате. 
population is 3 inches, set up probable limits of the 


from a sample of size 100 taken 
-D. of the height distribution of 
mean height of the population, 

(M. Com., Nagpur, 1972) 


ITUR A 3 € 
S. E. a NUM PNE =0'3 
95% probable limits of the mean height of the population 
= Х-Е196.5.Е, 
=64+1°96('3) 
=64+0'588 
—63'4 and 64:6 
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99% probable limits of the mean height of the population 
E+ 2'58 S.E. 
=644-2:58 (3) 
=64-0°774 
—632 and 64:8, 
Illustration 15. A sample of 100 iron rods is said to be drawn from a large 
number of rods whose lengths are normally distributed with mean 3 ft. and standard 


deviation 0'6 ft. If the sample mean is 32 ft., can the sample be regarded as a truly 
random sample ? (M. Com., Meerut, 1973 


Solution, Let us take the hypothesis that there is по difference in the sample 
mean and the population mean. 


с 
SE. = VN 
o=0'6, N=100 
06 06 
Les m e =0'06 
BET у 100 10 


Difference between sample mean and hypothetical mean=3—3'2=0'2 
Difference —.02 433 
S.E. 0:06 

Since the difference is more than 2:58 S.E. (1% level of significance) it could 
not have arisen due to fluctuations of sampling. Hence the sample cannot be regarded 
as truly random sample. 

Wustration 16. А sample of 100 students is taken from a large population. 
The mean height of these students is 64 inches and the standard deviation 4 inches. Can 
it reasonably be regarded that in the population mean height is 66 inches ? 

(B. Com., Bombay, 1975) 

Solution. We have to test the hypothesis that the mean height in the popula- 

tion is 66 inches. 


с 4 EE 
S.E. of mean= WEN ЛО 0 =04 i 
Difference between the sample mean and population mean=66—64=2 
Difference 2 


SES IA S 

Since the difference between sample mean and population mean is more than 2'58 S.E, 
(1% level of significance) the result of the experiment does not support the hypothesis 
tbat the mean height in the population is 66 inches. 

Mlustration 17. If it costs a rupee to draw one number of а sample, how much 
would it cost in sampling from a universe with mean 100 and standard deviation 10 to 
take sufficient number as to ensure that the mean of a sample would in a 5 per cent 
probability be within 0°01 per cent of the true value? Find the extra cost necessary to 


double this precision. (M. Com., Banaras, 1976) 
Solution, X-100 
Difference between universe mean and sample mean—0'01 per cent 
a 
S.E. of mean DN н 
For 95% confidence, difference between sample mean and population mean should bi 
4 10 
equal to 1'96 S.E., i.e., JN 
7 196x10 .. 
ve EUNTES =001 
VN x001—19'6 
19.67 
V/N= 001 =1960 


N=(1960)?=3841600 
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Double the precision would imply that the difference between sample mean and popula: 
tion mean should be=0'005 only. 


So E 10 005 or /N— 


<- N=(3920)?=1,53,66,400 

+. Extra cost=1,53,66,400—38,41,600 

‚ 1,15,24,800 

Hence in order to double the precision the extra cost required is Rs, 1,15,24,800. 
Illustration 18, 400 labourers were selected at random from a certain distri 


Their median incom was Rs. 140°5 p.m. with standard deviation of Rs. 252. Do 
believe that the average income of the labour community in the district is Rs. 150 ? 


Solution, Let us assum: that the distribution of population (ie., labourers) ii 
the district is normal. For a normal distribution the standard error of median 


19'6 
0:005 


—3920 


5 c 
=1°2533 WN 
о=25'2, N=400 


252 
7. S.E.of median = 1/2533 << 
4 400 


ТЕР 
= 12533 5 —r58 


Difference between sample median income and assumed population median income= 
150—140'5=9'5 

Difference _ 9'5 

SB 158 

Since the difference is more than 2'58 S.B. (1% level) it is unlikely that the average 

income of the labour community in the district is Rs. 150, 


Illustration 19, To study the correlation between the stature of the father and. 
the stature of the son, a sample of 1,600 is taken from the universe of fathers and sons. 


The sample study gives the correlation between the two to be 0'80. Within what limits 
does it hold true for the universe? 


=601 


Solution, The standard ecror of the Correlation Coeflizient between the stature 
of the father and son is 


SE, e 
r=0'8, N=1,600 
T 
v 1600 
E 1—0'64 036 


If the sampling was simple random sampling the correlation in the universe cannot 
Bree from the correlatioa in the sample by more than thrice this standard error, i.e, 


"027, Hence correlation in the universe most robably lies bet r3 S.E. ОЁ 
08-0027 or 0773 and 0'827, ое 


ё Illustration 20, Find the regression coefficient of Bombay prices over Calcutta - 
Prices from the following data and also calculate its standard error, 


Bombay Calcutta 
Rs. Rs, 
Average price per quintal of wheat 120 130 
Standard deviation 4 5 
Coefficient of correlation =06 


N= 100 
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Solution, Regression coefficient of Bombay prices (x) 
over Calcutta prices (y) 
сл А AEST r 
byz=r E -06, =0'48 
rhe standard error of this regression coefficient is 
5.Е.һху= oo = 
oy N 
401—006) 
5v 100 
4x08 .,. 
== =0`064 

Thus the regression coefficient of Bombay prices over Calcutta prices is 0'48 and 
{їз standard error is 0:064. 

Illustration 21. Fora given group of adults, the coefficient of correlation be- 
tween height and weight is 07, standard deviations of height and of weight are 2 inches 
and 10 Ib. respectively and the means in height and weight for the entire group аге 70 
inches and 130 16. respectively. Find out the best estimate of the weight of an indiyi- 
dual who is 65 inches tall. Assign limits to this estimate in. which in all probability 
Gis actual weight would be lying. 

Solution. The regression of weight (Y) on height (X) is 


Y-Y-r-*—(x-X) 
ба 
Y=130, 2-70, oy=10, о2=2, r=0'7 
Y-130-07 2 (X—10) 
Y—130=3'5 (Х—70) 
Ү—130=3'5 Х— 245 
Ү=35Х—115 
when Х=65, Y will be 
Y=3‘5 (65) —115—112/5 Ib. 
The standard error of estimate is 
S.E.yo= oy /1—r* 
210/1—(07)* 
=10y 051 
-—10x07714-714 Ib. 


Thus the best estimate of the weight of an. individual who is 65 inches tall is 1125 
ib. and this estimate cannot deviate from his actual weight by more than three times 
the standard error. Hence in all probability his actual weight would belying between 
112:52-3(714) or 11252-2142, i.e., 91°08 Ib. and 133'92 Ib. 


Illustration 22. From the following data compute the value of Yule’s coefficient 
and find out its standard error. Also determine 95 per cent fiducial limits of the true 
values in the population. 


Married and failures =100 
Married and non-failures = 50 
Unmarried and failures =20 


Unmarried and non-failures=80 
Solution, Let A denote married and B failures. 


<. æ will denote unmarried and 8 non-failures. The given information in terms 
of these symbols is 
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(4B)—100, (48) —50, (B) 20, (43) — 80 
qu (AB) (28)— (A8) («B) 
(AB) (28)3- (48) (аВ) ; 
—..(100x 80) — (50x 20) 
(100x 80)-- (50x 20 


000 
9:000 — 40778 
&.E,-. (1-0? Я) 


П П Е | 
(AB) * (48) + (Bj; + абу 
_[1—(0-78# ү Po ий ЖИЫ ae 
= 2 Me 30 +30 * 35 
-{ 01-009. } 4 0'01--002--005--0:0125 
=0`1975у 070925 
019753 0:304—0:06, 


95% fiducial limits : 
Q--196 S.E. will give the required limits 
0'7784-1°96(0°06) 
07783-071176—0'66 and 07896. 
Standard Error of the Difference between the Means of two Samples 


(i) If two independent random samples with m and л, numbers res- 
pectively are drawn from the same population of standard deviation c, 
the standard error of the difference between the samplé means is given 
by the. formula : 


S.E. of the difference between sample means 


- SEL) 
п Ne 
If c is unknown, sample standard deviation for combined samples 
must be substituted. 


(i) If two random samples with Xi, оз, т, and x, 
tively are drawn from different populations, then the S.E. 
between the means is given by the formula : 


» Sx, Mp respec. 
of the difference 


2 2 
e oy Sz 
n ng 


and where c1 and c are unknown. 
S.E. of the difference between means 


en Se ae 
n a) 


where S, and S, Tepresent standard deviations of th 


€ two samples. 
К Illustration 23. (a) Random samples drawn from two countries gave the follow- 
ing data relating to height of adult males : 


Country A Country B 
Mean height (in inches) 6742 6725 
Standard Deviation 258 2:50 _ 
Number of observations 1,000 1,200 


— 
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Solution. Let us take the hypothesis that there is no difference in the meam 
heights of the two samples. 


S.E. of the difference of means is given by 
E PE 
SpA E T3 
(Xi— X9) ny d т 
а:—2:58, са=2:5, nı=1,000, п=1,200 
Substituting the values 


—4/00067-F00032— \/0°0119= 011 
Difference in the mean heights in the two countries 
-—(6742—6725) —017 
Difference _ 017 — ,. 
Coca Ar (OR 
Since the difference in the mean heights of the two samles is less. than 1°96 S.B. 
(5 per cent level), it could have arisen due to fluctuations of sampling, Hence we con- 
clude that the mean heights do not differ significantly. 


(b) A man buys 100 electric bulbs ofeach of two well-known makes, taken 
at random from stock for testing purposes. He finds that ‘make A’ has a mean life of 
1,300 hours with a standard deviation of 82 hours, and ‘make В” has a mean life of 1,248 
hours with a standard deviation of 93 hours. Discuss the significance of these results. 


(M. Com., Delhi, 1975) 


Solution, We are given н 


X,—1,300 X.— 1,248 
0,—82 вз=93 
пү=100 nz=100 , 
o? , o 
SEcg gyal n, a) [^ 
i ACP eee i 
СА 100 100 
76774486490 Г 15373. 
"i 19:5 7A Уб. v 


Observed difference between the two means 
= (1300—1248) —52. 
Difference — 52. 
SIE J24 

Since the difference is more than 258 S.E. (1 per cent level of significance) it 


could not have arisen due to fluctuations of sampling and hence we conclude that the 
difference is significant. 


=419. 


Tilustration 24, (а) 490 male students and 450 female students appeared at an 
examination in Statistics. The mean and standard deviation in marks of male students 
are 543 and 175 respectively whereas those of female students are 506 and 18'0. Is 
there a significant difference iolnarks of male and female students ? 


Solution. Let us take the hypothesis that the marks of male and female 
students do not differ significantly. 
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S.E. of the difference of two means is given by : 


Eras с? в? 
SER- Aut ur 
291—175, 05—18, 1; —490, n$—450 
=,/ 75)? , (18)? 


SE (=) 490 +4507 
=v 00570 f= 1T345—116. 


The observed difference between the mean marks of male and female studentse 
(543—506) 37. 

Difference — 37 rs 
SEX ЗҮ, 

Since the difference in the mean marks of male and female students is more 
than 258 S.E. (1 per cent level) we conclude that the marks of male and female students 
differ Significantly. 

(6) Ina survey of buying habits, 400 women shoppers are chosen at random 
in super market :‘A’ located in a certain section of Bombay City. Their average 
monthly food expenditure is Rs. 400 with a standard deviation of Rs. 12. For 400 
women shoppers chosen at random in Super market ‘B’ in another section of the city, 
ithe average monthly food expenditure is Rs. 395 witha standard deviation of Rs. 15. 
Test at a level of 0'05, whether the average food expenditure of the two populations of 
Shoppers from which the samples were obtained are equal. (M. Com., Delhi, 1973) 

Solution, Let us take the hypothesis that there is no difference in the average 
food expenditure of the two Populations of shoppers. 


JN 
SE R-Xa Gt 
nı=400, X,=400,0,=12 
пз=400, 2—395,5,—15 
Substituting the values 4 


Ro Me ME (ДОК (5) е 
SES =m) NE. 


-[38 + - V OVE 096 
= $1—Х»=400—395=5 
Difference vs —521 
Д SE 7996 — 
Since the difference is more than 1796 S.E. (5% level) the hypothesis is rejected. 


Hence the average food expenditure of two populations of shoppers from which the 
Sample were obtained js not equal. К ER 


hours with a standard deviation of 60 hours. Can the buyer te quite certain that the 


Solution. Let us take the hypothesi ү Тег riani- 
Gcantly in quality е hypothesis that the two brands do not differ 


Ni=100, X1—1,210, 5,40 
N,—100, Y1—1,2:0, o4—60 


a EI be 
SS gg се ra 
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[ 3600, 3600 уут 
19 + 100 "32-724 


—X,=1210—1250=40 

Difference _ .40. 

SES TZ 

Since the difference between the two averages is more than 2°58 S.E. (1% level) 


it could not have arisen due to sampling fluctuations. Hence the t wo brands differ- 
significantly in quality. 


Standard Error of the Difference between two Standard Deviations 
In case of two large random samples, each drawn from a normally 


distributed population, the S.E. of the difference between the standard: 
deviations is given by : 


=S55 


= px: 
SE(n—o)—4 / 2-422 
бр 2лі 2б 

Where population standard deviations are not known 


S.E.(si == 2m Mis cat 


Illustration 26. Intelligence test of two groups of boys and girls gives the follow-- 
ing results : 
Girls : Mean- 84, S.D.—10, N=121 
Boys : Mean- 81, S.D.—12, N=81 
Examine (a) Is the difference in mean scores significant ? 


(b) Is the difference between standard deviations significant ? 
(М.А. Econ., Punjab, 1976) 


Solution, (а) Let us take the hypothesis that there is no difference in mean» 


Scores. 
[o 2 
Бе К ЕЮ a) "ES * Du 
с1=10, се=12, 1;,—121, 13—81 
У Е TUNER AOE 
Ei- 3) Qi 8 


a mam. 100 HM ras 161 


Difference of means= (84—81) =3 
Difference Ses 186 


EGSET ju ESL. 
Since the difference is less than 1'96 S.E. (at 5% level) our hypothesis holds: 
good. Hence the difference in mean scores of boys and girls is not significant. 


(b) Let us take the hypothesis that there is no difference between the standard» 
deviation of the two samples. 


БЕ (оу—с;) EE лр 


5,—10, idi 1,—121, n—81 
“Дейл De 
Q9? + (12)* 
Etoo) 2х0г “2x81 
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=,| 100, 4. /r302-1: 
Al 242 ^ 162 V V302=1'14 


Difference between the two standard deviations=(12—10)=2 
Difference _ 2 175 
SEO LM Е 
"Since the difference is less than 1°96 (at 5% level) our hypothesis holds good. 
Hence the difference between standard deviations is not significant. 


Illustration 27. The mean produce of wheat of a sample of 100 fields is 200 Ib. 
per acre with a standard deviation of 101b. Another sample of 150 fields gives the 
mean at 220 Ib. with a standard deviation of 12 Ib. Assuming the standard deviation 
ofthe mean field at 111Ь. for the universe, find at 1% level if the two results are 


consistent, (М.А. Econ., Punjab, 1974) 
Solution, S.E., =a Быр ANM 
ч (e1—92) Zm E ng 


(ET. qup Wt 
АШ Gi D 150.) 
605-605, > XU 
=a Tm t ў "403 = 1'004 
100 ^ sg ¥ 06054-0403 
Difference between two standard deviations 
=(12—10)=2 Ib. 
Difference E д. 
SB = тур9 Д 
Since the difference is less than 2°58 S.E. (126 level of significance) the two 
results may be regarded as consistent. 
Hl. SAMPLING OF VARIABLES (SMALL SAMPLES) 


So far we have discussed problems relating to large samples. When 
the size of sample is small (Jess than 30) the above tests are not applicable 
because the assumptions on which they are based generally do not hold 
good in case of small samples. In particular, it will no longer be possible 
for us to assume (a) that the random sampling distribution of a statistic 
is approximately normal, and (5) that values given by the sample data are 
sufficiently close to the population values and can be used in their place 
for the calculation of the standard error of the estimate. 

The removal of these assumptions makes it necessary to use en- 
tirely new techniques to deal with the problems of small samples. The 
division between the theories of large and small samples is, therefore, a 
very real one, though it is not always easy to draw a precise line of de- 
marcation. It should be noted that as à rule the methods and the theory 
of small. samples are applicable to large samples, though the reverse is 
not true. 


While dealing with small samples our main interest is not to esti- 
mate the population values as 15 true in large samples ; rather our interest 
lies in testing a given hypothesis, Le. in ascertaining whether observed 


* Yule and Kendall: An Introduction to the Theory of Statistics, р, 482. 
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an uncorrelated population, i.e., whether it is significant of correlation in 
the parent population. 


It should be noted that the investigator who works with very small 
samples must know that his estimates will vary widely from sample to 
sample. Moreover, he must content bimself with relatively wide confi- 
dence intervals. Precision of statement is less, of course, the wider the 
intervals employed. Each inference drawn from large sample results is 
far more precise in the limits it sets up than is an inference based on a 
much smaller sample. 


"The Assumption of Normality 


While dealing with small samples also an assumption is made that 
the parent population is normal, unless otherwise stated. Strictly speak- 
ing, therefore. our results will be true only for the normal population. 
However, as pointed out earlier the assumption of normality is not very 
much warranted in case of small samples. Experiments have, therefore, 
been made to ascertain whether the results are true for other types of 
population. Theoretical work confirms that the results remain true for 
populations which do not deviate markedly from normality. However, 
if there is any good reason to suspect that the parent population is mar- 
kedly skew, i.e., U, or J-shaped, the methods given below cannot be ap- 
plied with much confidence. 

Since in many of the problems it becomes necessary to take а smal 
size sample, considerable attention has been paid in developing suitable 

. tests for dealing with problems of small samples. The greatest contri- 
bution to the theory of small samples is that of Sir William Gosset and 
R.A. Fisher. Sir William Gosset published his discovery in 1905 under 
the pen name 'Student. He gave a test popularly known as ‘t-test’ 
and Fisher gave another test known as 'z-test. These tests are based on 
*r'-distribution and ‘z’-distribution. 

Student's t-Distribution 

Gosset in a paper, published in 1905, derived a theoretical distri- 
bution which has come to be known as "Student's “distribution”, The 
quantity t is defined as 

(X—u. - 
4 RET 2. РУ. n 
where X—mean of the sample 
ш=теап of the parent population from which sample has been 
drawn, 
S=the standard deviation of the sample, 
n=the number of observations in the sample. 


It should be noted that the estimate of the population mean, p, is 


always ï= A but the estimate of the population variance in small 


— 2” 4 e Xy . + . " 
sample is xx—23 instead of cn which is permissible in сазе of 


(0—1) А 
large sample. Hence s=,/ i-em td based on (n—1) degrees of 
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freedom. When л is small the distribution of ¢ is far from normal. When 
п is infinite the two distributions are identical and for values of л over 30 
the differences are small, i.e., the curve closely approximates the standar- 
dized normal curve. 


The distribution curve of the statistic t is 


MO n 


Tu 
(1+5) 


where v—number of degrees of freedom* and Yo is such that the total area 
under the curve is unity. 


The t-curve has a mode, coinciding with the mean at t=0. 1115 
symmetrical and, like the normal curve, extends to infinity on either side 
of the mean. 


? » 1 ^ 
As v tends to infinity, уру tends to є-*/2 and hence t 
is distributed normally. 


e\2 
(+9) 
Properties of -Distribution 
The following properties about t-distribution are worth noting : 


е (1) t-distribution ranges from —ос to оо just as does a normal 
distribution. 


(2) Like the standard normal distribution, ¢-distribution is 
symmetrical with mean, mode and median equal to zero, except that г, 
(I-distribution with one degree of freedom) has no mean. 


....,Q). t-distribution has a greater dispersion thanthe standard normal 
distribution. As 7 gets larger the t-distribution approaches the normal 


FREQUENCY CURVES OF THE NORMAL DISTRIBUTION AND t -DISTRIBUTIONS 


*40, с 
NORMAL DISTRIBUTION 
---- 8 -D/STRIBUTION 


——t-DISTRIBUTION ( 


30 


generally denoted by v (the 
+ is defined as the number N of independent observations in the sample 
(i.e., the sample size) minus the number K of population parameter which must be 
estimated from sample observations. Symbolically у= N—K. 
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form. Where л is as large as 30, the difference is very small. Relations 
between f-distribution and the normal form are shown in the diagram 
on page A-3:30 in which are plotted r-curves for n—2 and n=25, together 
with a normal frequency curve. This diagram clearly shows that the form 
of the distribution varies asm changes. There is a specific distribution 
of t for every value of n. 


The /-Table. Тһе t-table given at the end is the probability integral 
of t-distribution. It gives, over a range of values of v, the probabilities 
of exceeding by chance value of at different levels of significance. The 
t-distribution has a different value for each degree of freedom and when 
degrees of freedom are infinitely large, the t-distribution is equivalent to 
normal distribution and the probabilities shown in the normal distri- 
bution tables are applicable. 


Applications of the /-Distribution 


The following are some of the examples to illustrate the way in which 
the 'Student' distribution is generally used to test the significance of the 
various results obtained from small samples. 


1. To test the significance of the mean of a random sample 


In determining whether the mean of a sample drawn from a normal 
population deviates significantly from a stated value (the hypothetical 
value of the population mean), we calculate the statistic : 


—‹(®-#уп 
S 


where =the mean of the sample 


= е actual or hypothetical mean of the population 
n=the number of observations 
S=the standard deviatior. of the sample. 


s= J У(Х— Х)* 
n=l 


i: 3i Zd*—(dy xn 
п—1 
where d=deviation from the assumed mean. 


If the calculated value of t exceeds 1.5, we say that the difference 
between .Y and p is significant at 5% level, if it exceeds ty.95 the difference 
is said to be significant at 1% level. If t<ty.95, we conclude that the 
difference between X and pis not significant and hence the sample might 
have been drawn from a population with mean—y. 


Fiducial Limits of Population Mean. Assuming that the sample is a 
random sample from a normal population of unknown mean the 9597 
fiducial limits of the population mean (и) аге: 


езг, 


Мп 
and 99% limits are 
X wma em 


SM-A—10°77-54 
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The following examples will illustrate this test. 


Tilustration 28. Ten individuals are chosen at random from a population and 
their heights are found to be (in inches): 63, 63, 66, 67, 68, 69, 70, 70, 71 and 71. In 
the light of the data discuss the suggestion that the mean height in the population is 
66 inches. : (М.А. Econ,, Delhi, 1972) 

Solution. We have to test the hypothesis that the mean height in the popula- 
tion is 66 inches. 


Applying t-test 
Me er 


im ут 
CALCULATING MEAN AND STANDARD DEVIATION 
~~ Height in inches " 
x (X—67) 4 
d 
RTS mpeg) | PNET, 33 RR карг рано 0504] 2 00 ai: 
6 c 16 
6 624 16 
66 -1 1 
67 0 0 
68 +1 1 
А 69 +2 4 
70 +3 9 
70 +3 9 
7 +4 16 
т +4 16 
®Х=678 Zd—8 243—388 
IX 678 
уе 
0:34 _ 840 
d= =70 08 
s= |4—üdyxn 
^ n-i 
= /[38—(0Ю8)#х10 
КАБИЛ М 
X-678, 5=3011, u—66, n=10 
Substituting the values, 


67`8—66 ЕТА 
бт TASMA 
- U8x3162  .. 
^ тоат hor 
Degrees of freedom or v—(10—1) =9. 
$ For 9 degrees of freedom the table value of f at 5% level of significance (fo.o5) == 
26. 
E Since the calculated value of / is less than the table value, the experiment pro- 
vides no reason for doubting the hypothesis that the mzaa height іп the population is 
_ 66 inches. 


Iustration 29. А random sample of size 16 Наз 53 as mean and the sum of the 
squares of the deviations taken from mean is 150. Can this sample be regarded as 
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taken from the population having 56 ? i 9 imi 
of the mean of Ше рор ? ak ped ih 956 Dre raps 
(B. Com., Gujarat, 1974) 
Solution, We have to test the hypothesis that the population mean is 56. 
Applying t-test. 


t= (x ;2 vn. 


X —53, u—56, n—16 and 
se ВГ Ж) _ [rs 


fom UV d$ V 10-3182 
Substituting the values 
153256 zs 
"$162 х VIG 
3x4 : 
теб 


Degrees of freedom ог у= (16—1) —15. 

For v=15 ft .05=2°13. 

The calculated value of fis much higher than the table value and hence the 
result of experiment does not support the hypothesis that the sample is drawn from the 
population having 56 as mean. 

95% confidence limits of the population mean 

> 5 
(Х) Жуу (60 


=з 4379x215 


=5341'684 
=51'32 and 54°68 
99% confidence limits of the population mean 


(X) toos 
=зз 18 хуз 
=53}223 
The required limits are 50°67 and 55°33. 
2. To test the difference between the means of the two samples 


. Giyen two independent random samples of size 7; and 7t, with means 
X, and X, and standard deviations $;. and 5, we may be interested in 
testing the hypothesis that the samples come from the same normal 
population. To carry out the test, we calculate the statistic, f. 


ш у / пуп 
S nn, 
where X,—mean of the first sample 
X,—mean of the second sample 
1,—number of observations in the first sample _ 2 
n,—number of obervations in the second sample 
S=combined standard deviation. б 
The value of S is calculated as follows : 
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saa / солга RS | 

n-n4—2 | 

When the actual means аге in fraction the deviations should be 

taken from assumed means. In sucha case the combined standard 
deviation is obtained by applying the following formula : 


s= EG — AY -E(G Ап А) — n — A): 
n-n,—2 
A,=Assumed mean of the first sample 
A,=Assumed mean of the second sample 
1= Actual mean of the first sample 
X,— Actual mean of the second sample. 
t here is based on (n, 4-7, —2) degrees of freedom. 


Ifthe calculated value of: be>%.95 (15.91), the difference between 
the sample means is said to be significant at 5% (1%) level of significance ; 
otherwise the data are said to be consistent with the hypothesis. 


The following examples will illustrate the test : 


Illustration 30. A group of seven-week old chickens reared on a high protein 
diet weigh 12, 15, 11, 16, 14, 14 and 16 ounces, a second group of five chickens similarly 
treated except that they receive a low protein diet weigh 8, 10, 14, 10 and. 13 ounces, 
Test whether there is significant evidence that additional Protein has increased the weight 
of the chickens (the table value of / for v=13 at 5% level of significance is 223). 

(M.B.A., Delhi, 1974) 

Solution, Let us take the hypothesis that additional protein has not increased 
the weight of chickens. 

Applying t-test of the difference of means of two samples : 


ta d Xs 1 тта 


nine 
~~ — — 

x (Х\—Х\)) (Х\— Ху)? Xs (х X3) (Х»— Xo)? 
12 250 4 8 -3 9 

15 +1 1 10 E] 1 

11 =3 9 14 +3 9 

16 #2 4 10 =f 1 

14- 0 0 13 T2 4 

14 0 0 

16 +2 4 


ZX,—98 (А-Х) | 30n—X)* ZX.—55  305— X) 
0 =22 EE 


Жа TR Ress үзү 
AX EST =14; X= me =i 
His Le АРН. 8 
s= | 01-00503 Pe)? 
туфта 2 7 - 
/ 22424 [46 _„. 
m Ew ea ү 
М 7+5=2 = у 10 28 
Х1=14, =, 5-214, m=7, m=5, 
Substituting the values in the formula : 
214—1 [7х5 
214 4/ 7+5 
3 


зны 
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= 13 TE 
=7]4 XLr71—2:397 
у= (m4-115—2) — (74-5—2) —10. 

For v=10, fo.05=2'23. 

The calculated value of 2 is greater than the table value and hence the hypothesis 
does not hold good. We, therefore, conclude that additional protein has increased the 
weight of chickens. 

Illustration 31, For a random sample of 10 pigs, fed on diet 4, the increases in 
weight in pounds in a certain period were : 

10, 6, 16, 17, 13, 12, 8, 14, 15; 9, 

For another random sample of 12 pigs, fed on diet B, the increases in the same 
period were : 

yes 13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17, 

Test whether the diets А and B differ significantly as regards their effect on 
increase in weight. 

Given the following : 


Degree of Value of t at 
freedom 5% level 
19 209 
20 2:09 
21 2:08 
22 2707 
23 207 


(M. Com., Meerut, 1974) 
Solution. Let us take the null hypothesis that diets A and В do not differ signi- 
ficantly as regards their effect on increase in weight. Applying t-test 


14-Х [тте 


5 y + 
s2 ГО Ха) +E (X-F)? 
v ny+ng—2 
Calculating the required values : s 
Pigs fed on diet A Pigs fed on diet B 
Ж 07, >ш ЕУ Ч 118 ПЫ " CTI ^ 
х «8 HES 
Increase in Deviations E? Increase in Deviations + 
weight | from mean 12| (Х\—Х\) weight from actual | (Xa— Xa)? 
x Q5-X) X mean 15 
gti (Х»— X) 

10 -2 4 7 —8 

6 —6 36 13 —2 £4 
16 T4 16 22 +7 49 
17 +5 25 15 0 0 
13 +1 1 12 eu 9 

12 0 0 14 Sy 1 

8 —4 16 18 +3 9 
14 +2 4 8 —7 49 

15 T3 9 21 +6 36 

9 -3 9 23 +8 64 

1 x 25 
17 +2 4 
k 
2X,—120 | £(X;— X1) —0 IX Xa)" 2X2=180 | Z(Xi— X3) E X(X,—X,)* 
E^ 1. =314 
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Mean increase in weight of 12 pigs fed on diet 4 


IX, 0020 — 
X m ~ jg 12 Pounds. 
Mean increase weight of 10 pigs fed on diet B 
y, 2X. 180 — 
= = 19 = 15 pounds 
SS 
s= Ген)? 
ni no—2 


= [T0314 lA 166 
У 1012-2 ^4/ ^20 
X1—12, Xo=15, m=10, n3—12, S—4°65, Substituting the values in the above 


formula 


112—155, 10062: 
466 "4/ 10412 


Sal ESAME 
Tg х2М-гя 


yn ng—2—104-12—2—20. 


For v= 20 at 5 per cent level the table value of tis 2'09. The calculated values 
15 less than the table value and hence the experiment provides no evidence against the 
hypothesis. We, therefore, conclude that dicts A and B do not differ significantly as 
regards their effect on increase in weight. 


Illustration 32, Two laboratories 4 and Bcerry out independent estimates of 
fat content in ice-cream made byafirm. A sample is taken from each batch, halved, 


and the separate halves sent to the two laborator ies. Thefat content obtained by the 
laboratories is recorded below : 


Branch No. Dp ea о er а Е Е (9. — 10 
Lab. 4 7 8 Ne 3 8 6 9 d. 7 8 
Lab. B OF 8 8 4 7 7 9 6 6 6 
(The fat contents are given in grammes.) 


Is there a significant difference between the mean fat content obtained by the two 
laboratories, A and B? 


You may use the following {extracts from t table in answering the question 2 
Degrees of freedom 6 7 8 9 10 16 18 20 


f at S% level 1°45 2:36 231 2°26 223 212 210 209 
(M.Com., Delhi, 1973) 


Solution. Let us take the[hypothesis that there is no difference between the 
mean fat content obtained by the two laboratories, A and B, Applying t-test 


dA. | nng 
5 TS 


Since the (actual [means are in fractions, we shall take deviations from the 
assumed means, 
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| 
Lab. A | Lab. В 
Batch Е I 
No. x | | адд уу | teal 
| 
! 7 +1 ҮҮ. 9 +2 4 
2 8 +2 4 8 +1 1 
3 7 +1 1 8 ze 1 
ЖИДЕ Зз DE 9 4 ze 0 9 
5 8 +2 4 7 0 0 
6 6 0 0 7 0 0 
7 9 зи 9 9 +2 4 
8 4 aD) 4 6 zar 1 
9 7 +1 1 6 -1 1 
10 8 42 4 6 zin 1 
с: a | 
fus Жал Аш. 41)? zx; | х0) | 108—242 
S Ces ДЕЕ | ( уйи ida (58598 i wey 


AT poe bone 201 едш VLA icd LSU 
ot AAS = 10 т 
D 10 =67, = 10 =70 
[EX AEG AE m Ufa An m УЛИ 
Ny+-Ng—2 


Su 


[31::22710(67—6)—100—7* 
104-10—2 
. [s-«9-0 . [541.1734 
y 18 M 18 
(mST [16х10 
1734 4/ 10+10 
03 (Ter 
"ти х2`236=0`387 
у=т+па—2=18; Богу= 18, 19.0577 2:10. 
The calculated value of t is less than the table value and kence there is no reason 
to doubt the hypothesis. We, therefore, conclude that the mean fat contents obtained 
by two laboratories 4 and B do not differ significantly. 


Illustration3 The mean life of a sample of 10 electric light bulbs was found 
to be 1,456 hours with standard deviation of 423 hours, A second sample of 17 bulbs 
chosen from a differem batch showed a mean life of 1,280 hours with standard deviation 
of 398 hours, is there a significant difference between the means of the two batches ? 

(M. Com., Delhi, 1975) 

Solution. ` Let us take the hypothesis that the means of the two batches do not 
differ significantly. Арр!уїп& t-test : 
t= =X: | mm 

5 АМ mtm 
11-1456, 1,—10, 5=423 ‚ 
Х»=1280, 19 —17, S2=398 
ge т-та? 
4/ mF 
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= | 10(423)?-+17(398)2 
10+17—2 


— | 17892903-2692868 
25 


=4/179286'32=423'42 


t—1456—1280 [19317 
42342 4/ 104-17 


iG PUNE 
= ya; 251-1043 


У=10+17—2=25; For v—25, 10.05=2'06 


The calculated value of t is less than table value, Hence the hypothesis holds. 


true. We, therefore, conclude that the means of the two batches do not differ signi- 
ficantly, 


Illustration 34. The following data represents the yields in bushels of equal 
areas of two agricultural plots in which plot I was treated the same as plot IL, except 
for the amount of Phosphorus applied as fertilizer, Is there a significant difference 
‘ between the yields on the two plots ? Apply f-test, usiag the difference between their 
means as a criterion of judgment, 


Plot 1 Plot II 

62 $6 
"7 59 
65 56 
60 57 
63 58 
5'8 5'7 
Ey 60 
60 55 
60 57 
x8 55 
Mean 6°0 Mean 57 


(M. Com., Meerut, 1975) 
Solution, Let us take the hypothesis that the yields on the two plots do not 
differ significantly. Applying t-test 


t= X-X | mm 


My +1, 
Xi (1-Х) Gi-X09* — x Qi—-X) — (Xa— Pe) 
62 +02 0°04 56 —01 0'01 
5'7 —03 0:09 59 +02 0°04 
65 +0'5 025 56 —01 - 001 
60 0 0 ӘЛ, 0 0 
63 +03 0'09 58 +01 001 
58 —02 004 57 0 0 
57 —03 0'09 60 +03 009 
60 0 0 55 —02 0°04 
63 0 0 27 0 0 
58 —02 0:04 555. —02 0'04 


2145,-60 5Х—Ў)=0 zu-X* Zg 505-9) х0 X9*-0724 
: ig 


SAMPLING AND TESTS OF SIGNIFICANCE A-339 


Sg nm 10 — } ЗҮР 10 


s= /5ОХ\—Х)*+5(Х»— 60) 
v nn—2 


1-6 57 — [ 19x10: 
0221 У 191-10 
03 : 3 
7 0221 x2:236—3'04 
v=) +Ng—2=10+10—2=18 ; For »—18, to.95=2'10. 
The calculated value of / is greater than the table value. Hence the hypothesis 


does not hold good. We, therefore, conclude that there is a significant difference 
between the yields on the two plots. 


Illustration 35. Two kinds of manures were applied to sixteen one-acre plots, 
other conditions remaining the same. The yields (in quintals) are set out below : 


Manure I 18, 20, 36, 50, 49, 36, 34, 49, 41 (9 plots) 
„ IE 29, 28, 26, 35, 30, 44, 46, (7 plots) 


Examine the significance of the difference between the mean yields due to the 
application of different kinds of manures. (M. Com., Meerut, 1972) 


Table of ‘t’ 
v 9 10 Hu 12 13 M 15 16 
5% points of t 226 7223 220 218 216 214 213 212 
Solution, Let us take the hypothesis that the quality of manure does not 
effect the yield. Applying /-test 
NOD [mn 


S М m+n 
Mamrel (Хі) | Qü—X)* — Mamrell (2-а) (Xs- Ka)? 

х Xa 

18 —19 361 29 -5 25 
20 —17 289 28 —6 36 
36 -1 1 26 —8 64 
50 +13 169 35 +1 1 
49 +12 144 30 —4 16 
36 -1 1 44 +10 100 
34 —3 9 46 +12 144 
49 +12 144 

41 +4 16 


33,-339 MR) RMR 34-28 WGK) XO.) 
=0 34 =0 =386 


=i, 


р. ХХ 333 . pr 238 
=>, ҮЙ. 37; аа 


34 


nidna—2 
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3 а : 
mi 7776 1:984—0'571 


v=ny-+ng—2=9+7—2=14 ; For v=14, 19.957214 
The calculated value of г is less than the table value and hence the hypothesis 


holds true. We, therefore, conclude that the mean yields do not differ significantly due 
to the application of different kinds of manures, 


The ‘Difference Test’ 

In case of paired data, i.e., where we have the same individuals but 
are trying to find out the effect of а certain drug or we have the same 
Students but are interested in finding out the effect of a certain coaching 


class the ‘difference test’ is applicable. While applying this test the 
value of t is obtained as follows : 


4—0 a d n 
peg ro = м 
5 * nort S 

where d—the mean of the differences 
S—the standard deviation of the differences. 


The value of S is calculated as follows : 


It should be noted that г is based on п—1 degrees of freedom. 


The following examples will illustrate the application of difference 
test : 


Miustration 36, Eleven school boys were given a test in geometry. They were 
given a month's tuition and a second test was held at the end of it. Do the marks 
give evidence that the students have benefited by the extra coaching ? 


5 Marks Marks 
c (Ist test) (2nd test) 
1 23 24 
2 20 19 
3 19 22 
4 21 18 
5 18 20 
6 20 22 
y 18 20 
8 17 20 
9 23 23 
10 16 20 
11 . 19 17 


Solution. Let us take the hypothesis that the students have not benefited by the 
extra coaching and hence the expected mean difference in the results is zero. 


Applying t-test (difference formula) : 


to Vn 
S 
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(2nd Test 

Boys Marks Marks —Ist Test) 
(Ist Test) (2nd Test) d а 
1 23 24 1 1 
2 20 19 =l 1 
3 19 22 3 9 
4 21 18 =9 9 
X 18 20 2 4 
6 20 22 2 4 
7 18 20 » 4 
8 17 20 3 9 
9° 23 23 a 0 
10 16 20 4 16 
1 19 17 ze 4 

Zd=11 Zd*-—61 ү 
| xe (d xn 
п—1 
= Ге (хт /50_,. 
поту 0: =2'236 


0—1, S=2 24, п=11. 
Substituting the values, 
1ХМ Т 
2:236 
v=n—1=11—1=10 ; For v=10, to.95=2'228. 


The calculated value of г is less than the table value and hence the results of the 
experiment do not provide any evidence against the hypothesis. We, therefore, conclude: 
that the students have not benefited by the extra coaching. 


Illustration 37. А certain stimulus administered to each of 12 patients resulted: 
in the following change in blood pressure : 


5, 2, 8,—1, 3, 0,—2, 1, 5, 0, 4, 6. 


Gan it be concluded that the stimulus will in general be accompanied by an: 
increase in blood pressure ? (B.Sc. Gujarat, 1970 ; B.A., Bombay, 1975) 


Solution, Let us take the hypothesis that the stimulus does not increase the: 
blood pressure. 


Applying difference test : 
tad V n 
S 


=1'48 
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э, 
* | 


5 25 
2 4 
8 64 
-1 1 
3 9 
0 0 
=2 4 
1 1 
5 25 
4 6 
4 1 
6 36 
Zd—31 Zd*—185 
Duc Mg 
а= n 7127258 


[185—(2 5839 x12 _ [185=80°08 _ 4.0 
m vn 


2:583, S=3'09, n=12 
2583. sn 
t= 3°09 4/12 =2'896 
y= (n—1)=(12—1)=11 : For »=11, fo.95=2°2 
The calculated value of г is greater than the table value and hence the result of 
the experiment does not support the hypothesis. We, therefore, conclude that the 
stimulus, in general, is accompanied by an increase in blood pressure, 
3. To test the Significance of an Observed Correlation Coefficient 


Given a random sample from a bivariate normal population if we 
аге to test the hypothesis that the correlation coefficient of the population 


is zero, ie., the variables in the population are uncorrelated, we have to 
apply the following test : 


F 
fm м 
УТ Х Уп—2 | 
here t is based on (n—2) degrees of freedom. 


If the calculated value of t exceeds foros for (n—2) d.f., we say that 
the value of r is significant at 5% level. If t< 10.05 the data are consistent 
with the hypothesis of an uncorrelated population. | 


The following examples will illustrate the test : 


Illustration 38, A random sample of 27 Pairs of observations from a normal 
population gives a correlation coefficient of 0°42. Is it likely that the variables in the 
Population are uncorrelated ? (B. Com., Bombay, 1976) 


Solution. Let us take the hypothesis that the variables in the population are 
uncorrelated, Applying t-test 


SAMPLING AND TESTS OF SIGNIFICANCB А-3 43 


r 
TUA pes oe, 
Here r=0°42, n=27. Substituting the values, 
qu E ы 
Vv1—(042) 
042x5 2 
= оов 231 
у= (n—2) — (27—2) —25. 
For v=25, t9.05—2'06. 
The calculated value of is greater than the table value and hence the result of 


the experiment does not support the hypothesis that the variables in the population are 
correlated. 


Illustration 39. How many pairs of observations must be included in a sample 
in order that an observed correlation coefficient of value 0'42 shall have a calculated 
value of ¢ greater than 272? 


X4/27—2 


Solution. t= Vik X4/n—2 


We are given the value of ? and v and we have to find out л 


M/n—2x'42—2772x:908 
—— „7247 
Ma-1— y 
V/n—2— 5:88 
n—2-—(588)* 
n=34'57+2=37 App. 
Hence, we should include 37 pairs of observations. 
Illustration 40. The following table gives the ages in years of 10 husbands and 
their wives at marriage. Compute the correlation coefficient and test for its significance. 
Husband'sage 23 27 28 29 30 31 33- 35 56:39 
Wife's age 18 22 23 24 25 26 28 29 30 32 
(B. Sc., Madras, 1976) 
Solution. We have test the significance of the correlation coefficient. 
Applying t-test 
j^ 
t= ——— ——4A/n—2 
M 1-75 Vn-2 
0'939 AM Et 
——— X = 
1200939) №102 
___0'939 
М1—0`882 
O939 CER E cel 
= 9344 %2'8284=7 zu 
y—-n—2- (10—2) —8. 
For v=8, [0.05=2731. 


The calculated value being much higher than the table value, the correlation is 
significant. 


*For this question r—0:939. For calculations refer chapter om correlation 
Analysis. 


х2'8284 
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Hlustration 41. Is a correlation coefficient of 0'5 significant if obtained from 
a random sample of 11 pairs of values from a normal population? Use г test, 
(M. Com., Meerut, 1976) 
Solution, We have to test the significance of the observed correlation 
«coefficient. 


у=11—2=9 
For n=9, 19.95—2'26 


( The calculated value of is less than the table value and hence the given correla- 
‘tion coefficient is not significant. 


Illustration 42, Find the least value of r in a sample of 27 pairs from a biva- 


riate normal population significant at 5% level. (M.A, Econ., Punjab , 1975) 
r i 
Solution, t= Vis x Vn—2 
r 
dv ух V 27-2 
ioe 
vi-n 


Now for ‘r’ to be significant at 5% level calculated value of ¢ should be greater 
than the value of £ from the table, at 5% level for 25 degrees of freedom which is 2:06 


Sr 706 
1-7 
2517-42436(1—r!) 
29/2436 r*42436 
A pre | 52436 _ 9. 
у 592436 |! 
Hence the required least value of ғ in a sample of 27 pairs t ignificant at 
5% level should be 0381. A. ee раша t, po Mmi 


, Hlustration 43, Ten cartons aretaken at random from an automatic filling 
machine. The mean net weight of the cartons in 11°8 ог. and the standard deviation 
015 oz. Does the sample mean differ significantly from the intended weight of 12 oz. ? 
Given for v—9, fo.95=2'26). (М.В.А. part-time, Delhi, 1977) 


Solution, Let us take the hypothesis that th l t differ signi- 
ficantly from the intended weight of ioe the sample mean does not differ signi 


(0) - 
25 220 UE 
X—118, u—12, S-015, n—10 
: (118—12) 
: t= pis XV 
.02x3162 


— pi =4216 
For Y—9, f9.05— 2/26 
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The calculated value of t is greater compared to table value and hence the null 
hypothesis is rejected. We, therefore, conclude that tbe sample mean differs significantly 
from the intended weight of 12 oz. 


Illustration 44, The heights of six randomly chosen soldiers are in inches : 
76, 70, 68, 69, 69 and 68. Those of 6 randomly chosen sailors are 68, 64, 65, 69, 72, 
64. Discuss the light that these data throw on the suggestion that soldiers are, on the 
average, taller than sailors. Use t-test. (M. Com., Meerut, 1977) 


Solutien. Let ustake the hypothesis that there is no difference in height of 
soldiers and sailors. Applying t-test 


ta 1-Х l DU 
5 nı Xs 
Height | (Xi-31 — QGG—X3* Height (0-Х)  (Xa— X 
X, X» 
76 +6 36 68 +1 1 
70 0 0 64 —3 9 
68 -2 4 65 -2 4 
69 -1 1 69 +2 4 
69 -1 1 72 t5 25 
68 -2 4 64 -3 9 at 
2-420 Hes i» з. IX,—402 E(X,—X,)1—52 
Z-n ; = 402 _ 67 
6 
s= [201 Ж) Ха їз)* 
у тп: 2 
= /46+52 _. 198.33 
М 6+6—2 10 


70—67. |6x6.. За. 

t 313 A 646 T X1732-166 
yc +n —2=6+6—2=10 

For ¥=10, 19.952723 


The calculated value of t is less than the table value. Hence there is no reason to 
doub: the hypothesis. Hence the data does not show that soldiers are, оп the average, 
taller than sailors. 


Illustration 45, Two working designs are under considération for adoption in 
aplant. A time and motion study show that 12 workers using design A haye a mean 
assembly time of 300 seconds with a standard deviation of 12 seconds, and that 15 
workers using design Bhave a mean assembly time of 335 seconds with a standard 
deviation of 15 seconds. Is the difference in the mean assembly time between the two 
working designs significant at one per cent level of significance? The following table 
gives some /-values which may be used : 


Level of significance Degrees of Freedom 
25 26 2 
рг==0'05 2:06 206 2:05 
pr=001 279 278 277 


(М. Com., Delhi, 1977) 


Solution, Let us take the hypothesis that the difference between the two 
designs ‘A’ and ‘B’ in respect of mean assembly time is not significant. Applying t-test 


(da X; 2 [nm 


ntn 
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X1=300, 51—12, m=12 

453—335, 5—15, n3—15 

s= LmsSe8tnss 
ni no—2 


M 1215-2 y 


5200-335 — | 


= [a2 4150057 _ [51083-14287 
25 


TE 
14287 Ху 12:15 
335 
14'287 
y—nitni—-2—12415—2-25 
For v=25, [0.01=2:79 
The calculated value of ¢ is greater as compared to the table value. Hence the 


hypothesis does not hold true. We, therefore, conclude that the difference in the mean 
assembly time between the two working designs is signficant at 1 per cent level. 


Illustration 46. In a certain factory there аге two independent processes. 
manufacturing the same item. The average weight in a sample of 250 items produced 

— from one process is found to be 120 oz. with a standard deviation of 12 oz. while the: 

corresponding figures in a sample of 400 items from the other process are 124 and 14.. 
Is the difference between the mean weight significant at 1 per cent level of significance ? 

р (M. Com., Delhi, 1977): 

Solution. Let us take the hypothesis that there is no difference between the 


weight of the two independent. processes. Calculating standard error of the 
difference of means : 


Х2'582=6`326 


—,.————— 
oy" 


2 ze 
SEG бш at 
1—250, 01—12, 153—400, сз=14 


sp- /(23 45 
250 400 
= 0576-0490 —4/ T066 =1`03 
Difference of means= Y1— Y»—120—124—4 
Difference 4 Е 
SE... ~ 103 398 


Since the difference is more than 2°58 S.E. (1% level) 
the difference in the mean weight of the two 


Cautions while Using t-test 


While drawing inferences on the basis of t-test it should be remem- 


bered that the conclusions arrived at on the basis of the ‘t-test’ 
fied only if the assumption upon which the test is based are true. If the 


actual distribution is not normally distributed then, Strictly speaking, the t 
test is not justified for small samples. If the 


sample, then the assumption that the observa. 
dent is not justified and the conclusions b. 
correct. The effect of violating the normality assumption 


the hypothesis is rejected. Hence 
Processes is signficant, 


пегеп‹ g is fairly large 
when dealing with small samples. However, it is a good idea to check the 


A review of similar samples or relat- 
as to whether or not the population is- 
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Z-test of the Significance cf the Correlation Coefficient 


Prof. Fisher has given a method of testing the significance of the cor- 
relation coefficient in small samples. Tn this method the coefficient of сог- 
relation in the sample of r is transformed into Z and that is why the name 
Z-transformation. The statistic Z given by Prof. Fisher is used to test (i) 
whether an observed value of r differs significantly from some hypothe- 
tical value, or (ii) whether two sample values of r differ significantly. 


For testing whether r differs significantly from zero, the t-test is 
preferable. 

In order to apply the test we have to calculate Z and Ё by applying 
Fisher's transformation and then calculate the'value of the standard normal 
variate 

2—5 


IV n=3 
If the absolute value of this statisticřexceeds 1°96 the difference is signi- 
ficant at 57; level. 


ue 1+r Р 1+7 
Неге Z— log, 1+” or 11513 log, (442) 

[odor PMN үү 

E= > log. Te or 1°1513 logy» (1) 


р refers to the population correlation coefficient. 


1 
S.E.,— —- 
E Yn 
The following examples will illustrate the test : 


Illustration 47, Test the significance of the coirelaticn r= 0:5 frcm a sem ple of 
size 18 against hypothesis correlation e—0'7, (B. Com., Bombay, 1974) 


Solution. We have to testthe hypothesis that correlation in the population 


is 07. 
Applying Z-transformation 


Аў, 1405 
=1'1513_log1o 1-0 
—]'1513 log 3 
=1°1513 X0°4771=0'549 
Шр 9 1307 
= 2 108 ics 
alin, 1407 
250867197 
—1'1513 log 5°67 
=1°1513 X0°7536=0'868 
Z—E=0'549—0°868=0'319 
"neo 1 
S er NES) Vis = 3873 
Difference _ 0:319 =1'236 


0°258 


S.E. 0258 
SM-A—11°77-55 
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Since the difference is less than 1:965 S.E. (5% level.of significance) it could have arisen 
due to fluctuations of sampling. Hence e may very well be 0°7. 
Illustration 48, From a sample of 19 pairs of observations the correlation is 0'5 
and the corresponding population value is 0'3. Is the difference significant ? i 
(B. Com., Gujarat, 1973) 


Solution, Using Z-transformation, 


з ES 

1+7 
i=? 

К 1+0'5 
=1`1513 logo 17s 
=1`1513 log 3 

=1°1513 X0'4771=0'549 


Elogio Texts 


1403... 

1-93 Х1'513 
=log 10 1:857 X 1513 
—0'2695x 11513—0310 

(2—5) —0549—07310—0:239 


—]1513 flogio 


=logio 


1 
——=0'25 

мл-3 V16 

Difference _ 0239... 

TREE. "gas et 
Since the difference between Z.and& is less than 258 S.E. (1% level) the difference 
could have arisen due to fluctuations of sampling, Hence the given correlation 
coefficient is not significantly different from 0°3, 


To test the significance of the difference between two independent corre- 
lation coefficients 


To test the significance of two correlation coefficients derived from 
two separate samples we have to compare the difference of the two 
corresponding values of Z with the standard error of that difference— 
remembering that the standard error of the difference of two statistical 
quantities is the square root of the sum of their variances. In other words, 
we have to apply the following formula : 


Ze — A 
л/ m-3 m-3 
where z-L юе RL ) or 1°1513 log (+2 
2 1—r, 1-3 
a 14-7 : 14-7, 
and 2»=-у-1ов, (2) or 1°1513 logy, ( 6 ) 


1 1 
+. 


If the absolute value of this statistic is greater than 1°96 the difference will 
be significant at 5% level. The following examples will illustrate the 
application of this test ; 
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Illustration 49 (A), The following data give sample sizes and correlation coeffi- 
ce Test the significance of the difference between two values using Fisher's Z-trans- 
ormation. 


Sample Size Values of r 
5 | 0'870 
12 | 0'560 


(B. Com. Sardar Patel University, 1969) 
Solution. Applying Z-test 
К 21-22 


a 
JY esu 
У n—3 T nés 
hd ltr 1 dn 
2=— logs =F r or 11513 logio 1—n 
Here 1087, 
E. 1+0`87 
21=Г1513 log pa 
1'87 


711513 log 7013 


711513 log 14385 

=1°1513 х 1579—1:82 

acl ltr eq 1+ 
Z= 2 loge lori or 1°1613 logio (5 pus 


ra 
С 1+0`56 
=1°1513 logio 12056 
< 1:56 
=1°1513 log 044 
711513 log 3:545 
=1'1513 X0°5495=0'633 


Z1—232182—0633—1187 


L^ Eri as 
SE - 0782-1522 

Since the difference is less than 2°58 S.E. (1% level) the experiment provides no 
evidence against the hypothesis that the samples are drawn from the same population, 
The Variance Ratio Test—F-test 


The object of the F-test is to discover whether the two independent 
estimates of population variance differ significantly, or whether the two 
samples may be regarded as drawn from the normal populations having 
the same variance. For carrying out the test of significance, we calculate 
the ratio F. Fis defined as : 

S? Xx(X—X) 


TY Ane 
Е= Si where S; mcd 


n, and п, refer to the number of observations in sample I and sample П 
respectively. 

If the calculated value of F exceeds Fy.95 for (п, — 1), (n,—1) d.f., we 
say that the ratio is significant at 5% level. If F <Fo.o5 we say that the 
same could have come from two normal populations with the same 


It should be noted that the numerator is always the greater variance. 

At the end of the text two tables are given—one, giving the 5% points 
of F and another giving the 1% points of F for v; and v,—v is the number 
of degrees of freedom of the greater variance and v, is the number of 

ees of freedom of the smaller variance. 

The tables give the values of F which could be obtained in random 
sampling from a normal population at the stated probability levels and 
degrees of freedom. The test is applicable even when the distribution of 

ation departs considerably from the normaltype. The following 


examples would illustrate the test : 
Illustration 49 (B). Two random samples drawn from two normal populations 


Bonito Y. 20, 16, 26, 27, 23, 22,.18, 124, 1251-19 
Sample II 27, 33, 42, 35, 32, 34, 38, 28, 41, 43, 30, 37 

Obtain estimates of the variances ofthe populations and test whether two 
populations have the same variance. 

Solution, Our hypothesis is that the two populations have the same variance. 
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l 
| 
| 
| 
| 
| 


Applying F-test, 
jas 
=н 
Sample] (а) ‚о SamleH -R 
X, Mais x? ABE A Xa X 
20 -2 4 27 zig 6 
16 —6 36 33 —2 4 
26 +4 16 42 +7 49 
27 +5 25 35 0 0 
23 T1 1 32 -3 9 
22 0 0 34 = 1 
18 —4 16 38 +3 9 | 
24 +2 4 28 =7 49 | 
25 +3 9 41 +6 36 
19 —3 9 43 +8 64 
30 +3 25 
ү z I 555 07 s 
1X,-2200 2х1=0 2х12=120  5Х,=420 Exe=0 — Xuj-314 
— 220  y.220 
X 10 =22, X2= 735 
zx? 120 
2, ci tel " 
Sio rey 1332 
Ix? 314 
Аш ee =28° 
SY т] = 28'545 
Sj 28545 .- 
F-$8-153n-214 


(numerator is always ibe greater variance 
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Y1—11 and у=9 [ Since v, is the number of degrees of 
freedom of greater variance 
The table value of F for v;—11 and у:=9 at 5% level is 311 which is less than 


the calculated value of F. Hence the samples may well be drawn from the same 
population. 


, , Illustration 50. In one sample of 8 observations, the sum of the squares of the 
"deviations of the sample values from the sample mean was 84'4 and in. the other sample 
of 10 observations it was 102'6. Test whether this difference is significant at 5 per cent 
level, given that 5 per cent point of F for v; —7 and уз =9 degrees of freedom is 3°29. 


Solution, We are given 
1,78, (X,— X.)!—844 
13—10, X5 — Y3)*—102:6 
We have to test whether the difference in the variances of the two samples is significant 
or дез Let us take the hypothesis that the samples аге drawn from the same normal 
population. 


35 
E 
а ХОХ Ха) _ 844 2. 
5 WE mu T2057 


LI08—X. 1076 _ 14.499 


2 

Ss m—-l 9 
12:057 ЁК 

F= 11400106 


For y,—9 and v4—7, Fo.95=3'29. 

The calculated value of F is much less than th: table value and, therefore, the 
тези of experiment does not provide any evidenze against the hypothesis. Hence the 
samples are drawn from the same population. 

Illustration 51, Out of 20,000 customers ledger accounts, a sample of 600 
was taken tO test the accuracy of posting and balancing and 45 mistakes were found. 
Assign Limits within which the number of defective cases can bz expected at 5% level. 


A, 
T И ^ 3 45 
‚ Solution: p i.e, proportion of тізіакеѕ = ару = 075 
<. g=1—075="925 
58, [ра . [0157925 E 
Von у 600 
95%, Confidence Limits 
p196 E 
*0752-196 ( 011) 
*0754- 022=*053—:097. 
Hence at 5% levei of signifizance it is expested that th: number of mistakes would vary 
between 5'3 to 97 per cent. 

Illusiratjon 52: In a sample of 400 parts manufactured by a factory, the 
number of defective parts was found to be 30. The Company, however, claims that 
only 5% of its product is defective. Is the claim tenable ? 

^ (LC.W.A, 1975) 


Solution; The Company claims that only 5% parts ere defective. Hence the 
95% Confidence Limit sre given by Х+-1:96 5.Е. 
Кен ы a ine ^ 
D Ee an 
95% Confidence Limits—Y--1:964.E 
='96-Е1`96 x 00109 
= 2540214 or `9714 and "9286 


SEs 
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Out of 400 parts manufactured, the expected good parts may vary between ‘8714x400 
and '9286 x 400 Le. between 388 and 371. The number of defectives is thus expected 
to lie between 12 and 29. Since the actual number is 30, the company’s claim that only 
5% Of its product is defective cannot be accepted. 


Illustration 53, Ina village ‘A’ out of a random sample of 1000 persons 100 
Were found to be vegetarians while in another village ‘B’ out of 1500 persons 180 were 
found to be vegetarians. Do you find a significant difference in the food habit of the 
people of the two villages § (Degree Prog. in B. Adm. & Commerce, T.U. 1977) 
Solution: Let us take the hypothesis that there is no significant difierence in 


the food habits of the people of two village. Applying the test of the difference of two 
proportions, 
S.E 


f 
| 
| 
Í 


ny Ng 
mpi nops 
EEIT! 
ny ns 


Рі i.e, percentage of vegetarians in village *A' — 190 х100=10 


80_ 


| 
(ру—р»)= VE p | 
| 


p= 


p» i.e, f 7 e village *B'— = 


р= (1000x10)-.(1500x12) 10000-18000 12% 
10004-1500 etr 2500 ere? 
4—100—112—88:897 


E pene SA SET 1 1 
(=) 112x883 (005 +50) 


X100—12 


Difference _ 12—10 ГА 
S.E. - 1288 
h is le value at 5% level of significance 
(the critical value at 5% level of significance is 1°96) there is reason to dobut the hypo- 
thesis. Hence there is no significant difference in the food habits of the people of two 


lllustraiion 54: Ina certain district A, 450 persons were considered regular 
consumers of tea out of a sample of 1000 Persons. In another district 'B' 400 were 
regular consumers of tea out of a sample of 800 persons. Do these facts reveal a 
Significant difference between the two districts as far as tea-drinking habit is concerned ? 
(Use 5% level) (C. A. 1975) 


Solution: Let us take the h othesis that there i eni E diferesce fi 
the two districts with regard to tea drinking habit D POE. no significant differenc 
S.E maa 
(P1—P2)= (+ ERE 
1—P2) v Р + 2) 
450 
7009 -045 
zd 00 a 
Pis 800 =0"50 
p-—430--400 — 17 
1000--800 ~ 36 
dE CREME 
q—1 Be ae 


oe CIA ey. 
Cau an (536 1000 3x) gug 
u—ps __'45—50 


SE ^ m 7208 
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Since the difference is more than 1°96 (5% level) the hypothesis does not hold true. 
Hence there is a significant difference between the two districts as for as tea drinking 
habit is concerned. 

Iilustration 55 ; A random sample of size 100 from a large population gave 
the following distribution ; 

Value 10—20 20—30 36—40 40—50 50—60 

Frequency 13 20 45 13 9 
Test the hypothesis that this sample comes from а population with mean 40 against the 
hypothesis that it does not do so. You are all given that the population standard 


deviation is 40. 
(B. Com., Bombay, 1976) 


Solution] Let us take the hypothesis that there is no differenc in the sample 
mean and hypothetical population mean i,e. 40, 


CALCULATION OF SAMPLE MEAN 


Value f m d' fd’ 
10—20 13 15 —2 —26 
20—30 20 25 —1 —20 
30—40 45 35 0 0 
40—50 13 45 1 13 
50—60 9 35 2 18 
N=100 Zid'— -15 
v Ifd’ 
X=4+ ч хс 
15 | 
=35— оо ^10=33 5 
SES с 40 
Х = == = ‚——=4 
V N 130 
Хы  305—40 
DU a 1625. 
ЭЕ 4 26 


Since the difference is less than 1°96 S.E (5% level of Significance) our hypothesis hold 
true, Hence the sample could have come from a pepulation with mean 40. 

Tllustration. The mean population cf a random sample of 400 villages in Jaipur 
district was found to be 400 with a standard deviation of 12. The mean population 
of a random sample of 400 villages in Meerut district was found to be 395 with a 
standard deviation of 15. Is the difference between the two means statistically signi- 
ficant ? (M. Com. Meerut, 1976) 


Solution, Let us take the hypothesis that the difference between [the mean 
population of the two villages is not stat istically significant. 


S.E. 2 2 1933 ч EX 
ib КЕК ARR Ri 
91712, п1=400, 55—15, no=400 
EE = [ü2* 59 
БЕ 400 + 400- 


[i44 225. ,. 
zm [yj 4 HS r9 
у 400 7 400 с 
X,— Y3—400—395—5 
Difference 5  .. 
SE. “96 =5`21 
B the difference is more than 2:58 (1% level of significance) the hypothesis is 
nent Hence the difference between the mean population of the two villages is 
statistically significant. 
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ә Illustration 57, Electric bulbs manufactured by X and Y Cos, gave the follow- 
` ing results. 
X Co. Y Co. 

No. of Bulbs used 100 100 

Mean life in hours 2,300 1,248 

S.D. ir hours 82 93 
Using standard error of the difference between means, state whether there is a signi- 
Scant difference in the mean life of two makes, (С.А. 1975) 

Solution. 


Let us take the null hypothesis that the mean life of the two makes 
does not differ significantly, 
SE... [o wi 
XX) 90. ы 
(X, 2) "arm T па 
2,782, 1i —100, в» =93, п;=100 
S.E. v [6D 03E я 
—X).-. 1082), GBP 123 
(®,—Х,) V Tmo +7100 712399 
X,—X,,.1,300—1248 _,. 
SE 239 419 
Since the difference is more than 1:96 (5% level), our hypothesis does not hold true. 


Hence there is significant difference in the тап life of the two makes. 


lllustration 53. In measuring reaction tims, a psychologist estimates that the 
standard deviation is 


lr 0'05 seconds. How largs a sample of measurements must be 
taken in o:d.r to be 95% coafident that the error of his estimate wili not exceed 001 
seconds ? (LC.W.A., 1976) 
Solution, Since the error of estimates is not to exceed 0°01 second 
D'96c. ,.. 
"Vn 2001 
196x0:0570'014/n 
Y96x005 ,. 
n Vac un =98 
Da n« (9:8)? —96'04 
Hence the size of the sample should be not more than 96, 


Illustration 59. A population consists of four numbers 3, 7, 11, 15. Consider 
All possible samples of size two which can be drawa with replacement from this 
Population. Find 


(i) the population mean 
(ii) the population standard deviation. 
(iit) the mean of the sampling distribution of means. 
(iv) the standard deviation of the sampling distribution of means. 
Verify (iii) aad (iv) directly from (7) and (ii) by use of suitable formulas (which 
you areto m n'ioa). Solve this problem if samoling is without replacement. 


Solution. With 4 numbers 16 samples (i.e. 4X4) of size two can be drawn with 
replacement. The samples with their Tespecttve means are shown below : 


Sample Mean Sample Mean Sample -Mean Sample Mean 


3,3 3 3,7 5 qum eC o S 
7,3 5 77 7 71 9 7,15 п 
153 WAL lit SIN И 11,15 13 
15,3 9 187 Heist ЕТ 15,15 15 
А 24 A. 32 40 48 


Mean of the sampling distribution of Means 


24--32--404-48 9 
жа M 
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The mean of the population = SETIFIS L4 


Hence the mean of the sampling distribution of means is equal to the popnlation 


mean. 
STANDARD DEVIATION OF SAMPLING DISTRIAUTION 


Mean Deviation (d) d Mean Deviations (d) d? 


3 (3—9)=—6 36 7 (7—9)=—2 4 
5 = 16 9 (9—9)=0 0 
7 4 11 (11—9)=2 4 
9 (9—9)=0 0 13 (13-9) -4 16 
5 16 9 (9—9)—0 0 
7 4 1 (11—9)—-2 4 
9 0 13 (13—9)—4 16 
11 4 15 (15—9)—6 36 
Zd?=50 z42—80 


S. D. of sanapling distribution Vv L1 —316 
STANDARD DEVIATION OF POPULATION 


du eee 77^ (Х—9) x2 
E -— 
Б 3 T uS 36 
7 20 4 
11 2 4 
15 6 36 
5х=0 5х?=80 


The standard deviation of sampling distribution of means is equal to standard 


d of mean Le = 59316 (n=size of sample). 
If the sampiing is without replacement the following six samples can be 
drawn : athe 
(3, 7), (3, 11), (3. 15), (7,15, (7, 15), (11, 15) 
Their respective means аге 5, 7, 9, 11, 13 
54-74-94-9-H002-13.— 54 =9 
6 


Mean of sampling distribution of means= 6 


It is the same as population mean. 
Variance of sampling dist, of means 
(5—9) --(7—9)*-- (9—9)?-+ (9—9):-- (11994 (13—9)* 
6 


16442-4416 40. 
"D юй бооз б =6°67 
S.D.= V/667—258 
«нө (Уе ту фт 
20 —4/667—2:58 á 


NI 
МЗ А 

Illustration 60. 25: tyres were selected at random from a big lot and their 
average life was found to be 32000 kms. with a standard deviation of 2,000kms. Is 
it possible that the sample has come from a population with a mean iife of 35,000 kms, 


›=23 00.632 06). 
(rox are PE (Degree programme in B. Adm. & Commerce, T.U. 1977) 
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Solution. Let us take the hypothesis that there is no difference between the 
sample mean and the hypothetical population mean. Applying t-test : 


(Us 
S 


X=32,000, .—35,000, S=2,000, n=25 
_32.000-—35,000 .. .. 
semi ше: х5=7:5 
y-n—1—25—1—24 
For V=24 fy.95=2°06. 
,, Тһе calculated value of f is greater than the table value. Hence the hypothesis 


is rejected and the sample does not seem to have come from a population with mean 
life of tyres 35,000 kms. 


Tllustration 61, Prices of shares of a company on different days in a month © 
were found to be: 


66, 65, 69, 70, 69, 71, 70, 63, 64 and 68. Discuss whether the mean price of the 
Shares in the month is 65 (the table value Of 1—2262). (M. Com, Delhi, 1977) К 


Solution. Let us take the hypothesis that there is no difference between the 
mean Price of the shares and the hypothetical population mean. Applying t-test. 


na t 


t= е Мп 
CALCULATING MEAN AND STANDARD DEVIATION 
x (X—67) 
d d? 
Кү лынын T E С Жа 
66 -1 1 
65 -2 4 
69 2 4 
70 3 9 
69 2 4 
71 4 16 
70 3 9 
63 —4 16 
64 -3 9 
68 1 1 
ХХ=675 Xd-5 54—73 
х= ss -675 ; d-5/10—-9:5 
[`ха#—п(д® [13—10(5)* 
$= n(d* .. | 73—10C5)* -279 
v n-1 9 
1636465 у- ex 
t= 799 V 10 =2'82 


y—10—1—9. For v=9 foos=2'262- The calculated value of f is greater than 
the table value. Our hypothesis does not hold good. Hence the mean price ofthe 
sbares could not be Rs, 48, 


Limitations of Tests of Significance 
In testing statistical significance the following points must be noted : 
ё 1. They should not beused mechanically. Tests of Significance are 
simply the raw materials from which to make decisions, not decisions in 
themselves. There may be situations where real differences exist but do not 
produce evidence that they are statistically significant or the other way 


round. In each case it is absolutely necessary to exercise great care before — — 
taking a decision. 
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2. Conclusions are to be given in terms of probabilities and not 
certainties. When a test shows that a difference was statistically significant 
it suggests that the observed difference is probably not due to chance. 
Thus statements are not made with certainty but with a knowledge of 
probability. Unusual" events do happen once in a while. 

3. They do not tell us *why" the difference exists. Though tests can 
indicate that a difference has statistical significance, they do not tell us 
why the difference exists. However, they do suggest the need for further 
investigation in order to reach definite answers. 

4. If we are to have confidence in a hypothesis it must have support 
beyond the statistical evidence. It must have a rational basis. This phrase 
suggests two conditions : first, the hypothesis must be *reasonable' in the 
sense of concordance with a prior expectation. Secondly, the hypothesis 
must fit logically into the relevant body of established knowledge. 

The above points clearly show that in problems of statistical 
significance as in other statistical problems, technique must be combined 
with good judgment and knowledge of the subject-matter. 
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4 X Test and Goodness of Fit 
——————————_—_ шо 
In the previous chapter the tests of significance were made on the 
assumption that the form or type of population distribution was known. 
or example, a test of significance might be based on the assumption that 
the sample values were drawn from a normally distributed universe, 
9r that two samples were drawn from universes having the same variance. 
The testing procedures also assumed that the unknown values ofthe 
Parameters, about which statistical inferences were to be made, could 
be estimated from statistics obtained from random samples. This 


approach to inferential statistics is called parametric metho, Is, since the 
concern is with the value of a parameter. 


There are many situations in which it is not possible for the statis- 
tician to make rigid assumptions about the shape of the population from 
which samples are being drawn. This limitation has led to the develop- 
ment of a group of alternate techniques known as non-parametric 
or distribution-free methods, A non-parametric method may be defined 
as a statistical test in which no hypothesis is made about specific values 
of parameters. Distribution-free tests may be defined as methods for 
testing a hypothesis that does not depend on assumptions concerning 
the form of the underlying distribution, 

The x? test (pronounced as Chi-square test) is one of the sim- 
plest and most widely used non-parametric tests in statistical work. 
y applications in situations that involve 
ng of hypothesis concerning discrete or qualitative data. The 
"Greek letter y" was first used to describe this statistics by Karl Pearson 
in the year 1900. The quantity у? describes the magnitude of discrepancy 
‘between theory and observation, i.e., with the help of у? test we are in a 


The greater the discrepancy betw 
encies, the greater is the value of X. 


= 2 
jus 0-5! 
where O refers to the observed frequencies and E refers to the expected 
frequencies, 
Steps. To determine the value of x°, the steps required are : 
(i) Calculate the expected frequencies. 


(ii) Take the difference between observed and expected frequen- 
cies and obtain the Squares of these differences, i.e., obtain the values 
of (O— Е). ` 

(iii) Divide the quantity (O— E) obtained in step (ii) by the expected 


— Fy 
frequency and obtain the total X oo This gives the values of 7°. 
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The calculated value of у? is compared with the table value* of 
y! for given degrees of freedom at a certain specified level of significance. 
If at the stated level (generally 5% level is selected) the calculated value 
of y? is more than the table value of y* the difference between theory and 
observation is considered to be significant, i.e., it could not have arisen 
due to fluctuations of simple sampling. If, on the other hand, the calcula- 
ted value of x? is less than the table value, the difference between theory 
and observation is not considered as significant, i.e., it is regarded as 
due to fluctuations of simple sampling and hence ignored. 

It should be noted that the value of y? is always positive and its upper 
limit is infinity. Also since 7? is derived from observations, it is a statistic 
and not a parameter (there is no parameter corresponding to it). They? 
test is, therefore, termed non-parametric. It is one of the great advan- 
tages of this test that it involves no assumption about the form of the 
original distributions from which the observations come. 

Degrees of Freedom} 

While comparing the calculated value of y? with the table value 
we have to determine the degrees of freedom. By degrees of freedom 
we mean the number of classes to which the values can be assigned arbit- 
rarily or at will without violating the restrictions or limitations plac- 
ed. For example, if we are to choose any five numbers whose total 
is 100, we can exercise our independent choice for any four numbers 
only, the fifth number is fixed by virtue of the total being 100 as it must 
be equal to 100 minus the total of the four numbers selected. For exam- 

le, if the four numbers are 20, 35, 15, 10 the fifth number must be 
f00—(20-+-35-+15-+ 10)]=20. Thus, though we were to choose any 
five numbers we could choose any four only. Our choice was reduced 
by one because of one condition placed in the data, i.e., that of total being 
100. Thus there was only one restraint on our freedom—our degrees of 
freedom were only four. If more restrictions are placed our freedom to 
choose will be still curtailed. For example, if there are 10 classes and we 
want our frequencies to be distributed in such a manner that the number 
of cases,, the mean and the standard deviation agree with the original dis- 
tribution, we have three constraints (restrictions) and so three degrees of 
freedom are lost. Hence in this case the degrees of freedom will be 10—3 
—'7. Thus the number of degrees of freedom is obtained by subtracting 
from the number of classes the number of degrees of freedom lost in 
fitting. Symbolically, the degrees of freedom are denoted by the symbol v 
(pronounced mu) or by d.f. and are obtained as follows : 

v=n—k 

where & refers to the number of independent constraints. We havea 
constraint or restriction whenever observed or theoretical frequencies are 
made to agree with one another in some one respect in the operations that 
lead to the calculation of x’. Thus a constraint is imposed by the condi- 


*The table value of X? gives us how 2^ is distributed when chance alone is 
operative in bringing about differences between expectation and observation. 


+“The degrees of freedom may be considered as the number of л independent 
observations їп {йе sample minus the number of m parameter (required to compute 
the statistic) which must be estimated by sample observations. Thus the number of 
degrees of freedom or v—n—m." Chou : Statistical Analysis, p. 374. 
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tion Zf,—Zf,. In general when we fit a binomial distribution, the number 
of degrees of freedom is one less than the number of classes; when we 
fit a Poisson distribution, the degrees of freedom аге 2 less than the num- 
ber of classes ('. we use total frequency and arithmetic mean), and when 
we fit a normal curve, the number of degrees of freedom is small by 3 than 
the number of classes (because in the fitting we use total frequency, mean 
and standard deviation). 

Ina contingency table the degrees of freedom are calculated in a 
slightly different manner. The marginal total or frequencies place the 
limit on our choice of selecting cell frequencies. The cell frequencies of 
all columns but one (c—1) and of all rows but one (r—1) can be assigned 
arbitrarily and so the number of degrees of freedom for all the cel) fre- 
quencies (c—1)(r—1) where c refers to column and r refers to rows. Thus 
in a 2X2 table the degrees of freedom—(2—1)(2—1)—1. Having filled 
up one cell in such a table the rest of the frequencies automatically follow 
—thereis no choice for them. Similarly in a 3 х3 contingency table, 
the number of degrees of freedom is (3—1)(3—1)=4, and so on. It 
means only 4 expected frequencies need be computed. The others are 
obtained by subtraction from normal totals. 

The distribution of yè depends on the degrees of freedom. There is 
a different y? distribution for each number of degrees of freedom. The 
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distribution is very skewed to the right for small degrees of freedom, and 
as the degrees of freedom increase the curve becomes more and more 
symmetric and becomes approximately normal for large degrees of 
freedom. The above figure gives the distribution of y? for 2, 4 and 8 
degrees of freedem, : 


It is clear from the diagram that as the degrees of freedom increase 
the curve becomes more and more symmetric. The mean of the y? distri- 
bution is equal to the number of degrees of freedom and its variance is 
equal to twice the degrees of freedom. 


` The у? Test when the Degrees of Freedom exceed 30 
The table values of у? are available only up to 30 degrees of freedom 
For degrees of freedom greater than 30, the distribution of 3/22 


ee 
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approximates the normal distribution.* For degrees of freedom greater 
than 30 the approximation is acceptably close. The mean of the 


distribution 2x is\/2v—1, and the standard deviation is equal to 1. 
Thus the application of the test is simple, for deviation of 4/2y? from 


»/2v—1 may be interpreted as a normal deviate with unit standard. 
deviation. That is, 


2= 272—201. 
Alternative Method of obtaining the Value of у? 
In a 2X2 table where the cell frequencies and marginal total are as 


below : 
a b (a+b) 
c d (с+4) 
(acc) (6+d) N 


N is the total frequency and ad the larger cross-product, the value of y* 
can easily be obtained by the following formula : 
й (ad—bc)?.N 2 

* (ac b-Fdyc-dXa-F5) 
with Yate's correction 

а (ad—be—4N)?.N 

(a+c)(b+d)(c+d)(a+b) 
Conditions for Applying y? Теѕії 
The following conditions should be satisfied before applying the у% 

test : 


1. In the first place, N must be reasonably large. When N is small 

the probability given by the y? test is too small, with the result that the x? 

test might lead to a hypothesis being discredited whereas the exact 

procedure might cause one not to discredit a hypothesis. It is difficult to 

| say exactly what constitutes largeness, but as an arbitrary figure we may 
| say that N should be at least 50, however few the cells. 

2. No theoretical cell frequency should be small. Here again it is 
| hard to say what constitutes smallness but 5 should be regarded as the 
very minimum and 10 is better. In practice data not infrequently contain 
cell frequencies below these limits. As a rule, the difficulty may be met 
by amalgamating such cells into a single cell entitled ‘10 and over’. 

3. The constraints must be linear.t 
Yate's Corrections 

One of the conditions for the application of x? test is that no cell 
frequency should beless than 5 in any case, though 10 is better. This 


* We have seen that the distribution of X* tends to normality as the degrees of 
freedom increase. However, R.A. Fisher has shown that this tendency is more pro- 
nounced for the quantity VH than for x”, thus for a stated value ofv we get a better 
approximation to normality by using the distribution of the former quantity. 

+ Yule and Kendall: An Introduction to the Theory of Statistics. p. 469. 

{Constraints which involve linear equations in the cell frequencies (i.e, equa- 
tions containing no squares or higher powers ofthe frequencies) are called lineay 
constraints, 


+ 
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requirement is to avoid inflated chi-square values due to the division of 
thé squared differences by a small size of the expected frequency. When the 
theoretical frequencies are less than 10 and especially less than 5 the 
ordinary table values of у? are less reliable. This is especially true for one 
degree of freedom, it is true to a lesser extent for two or three degrees of 
freedom. However, the error is negligible for more than three degrees of 

m, 

In a special case of 2X 2 contingency table the approximation may 
beimproved, and bias arising out of the use of small theoretical fre- 
quencies may be reduced, by means of a correction proposed by F. Yates 
in 1934. The correction involves the reduction of the deviation of 
observed from theoretical frequencies which of course reduces the value of 
X. The working rule for the application of the correction is : adjust the 
Observed frequency in each cell of the 2х2 table in such a way as to 
reduce the absolute deviation of the observed from the theoretical frequency 
fof that cell by 1 ; adjustments for all the cells are to be made without 
changing the marginal totals. This operation will increase f» that is 
observed frequency, by 4 in each of two cells, and will reduce f; Бу } in 
each of two cells (please refer to Illustration 5). 

Anothe: method of adjustment which gives the same result as the 
above procedure is : f s x 
P сва) =н E)—0'5]* 4-10 5) 0:5] nu E)—0SP .— 

2 k 

In general, correction is made only when the number of degrees of 
freedom or v—1 and N is small. For large samples this yields practically 
the same result as the uncorrected y^. For small samples where each 
expected frequency is between 5 and 10, it is perhaps best to compare 
both the corrected and uncorrected values of у. If both values lead to the 
same conclusion regarding a hypothesis, such as acceptance or rejection at 
0°05 level, difficulties are rarely encountered. If they lead to different 
conclusions, one can either resort to increasing sample sizes or if this 
proves impractical, one can employ exact methods of probability involving 
the multinomial distribution, 

Grouping when Individual Frequencies are Small 

If small theoretical frequencies occur (less than 10 and certainly not 
less than 5) it is generally possible to overcome the difficulty by grouping 
two or more classes together. In other words, one or more classes with 
theoretical frequencies less than 5 may be combined into a single category 
before calculating the difference between observed and expected fre- 
quencies. With this practice it is important to remember that the number 
of degrees of freedom is determined with the number of classes after the 
regrouping. For example, if we start with 12 classes and 3 of them have 
small theoretical frequencies, we may pool these classes into one and shall 
be left with 10 classes to compare (please refer to Illustration 8). 

Uses of y? Test 

: ж? test is one of the simplest and the most general tests known. 
It is applicable to a very large number of problems in practice which can 
be summed up under the following heads : 

1. x” test as а test of independence. With the help of y? test we can 
find out whether or not two attributes are associated. Suppose we have 
N observations classified according to criteria. We may ask whether the 
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criteria are relative or independent. Thus, we can find out whether quinine 
is effective in controlling fever or not, whether there is any association 
between marriage and failure, or eye colour of husband and wife. In 
order to test whether or not the attributes are associated we take the null 
hypothesis that there is no association in the attributes under study or, in 
other words, the two attributes are independent. If the calculated value 
of x? is less than the table value at a certain level of significance (generally 
5% level, we say that the results of the experiment provide no evidence 
for doubting the hypothesis or, in other words, the hypothesis that the 
attributes are not associated holds good. On the other hand, if the 
calculated value of у? is greater than the table value at a certain level of 
significance, we say that the results of the experiment do not support the 
hypothesis or, in other words, the attributes are associated. It should be 
noted that yx? is not a measure of the degree or form of relationship, it only 
tells us whether two principles of classification are or are not significantly 
related, without reference to any assumptions concerning the form of 
relationship. 

2. x testas a test of goodness of fit*. y? testis very popularly 
known as test of goodness of fit for the reason that it enables us to 
ascertain how well the theoretical distributions such аз binomial, Poisson, 
normal, etc., fit empirical distribuions, i.e., those obtained from sample 
data. When an ideal frequency curve whether normal or some other type 
is fitted to the data, we are interested in finding out how well this curve fits 
with the observed facts. A test of the concordance (goodness of fit) of 
the two can be made just by inspection, but such a test is obviously 
inadequate. Precision сап be secured by applying the y* test. If the 


calculated value of y? is less than the table value at a certain level ОЁ а 


significance (generally 5% level), the fit is considered to be good, i.e., the 
divergence between the actual and expected frequencies is attributed to 
fluctations of simple sampling. On the other hand, if the calculated value 
of y? is greater than the table value, the fit is considered to be poor, i.e., 
it cannot be attributed to fluctuations or simplé sampling rather it is due 
to the inadequacy of the theory to fit the observed facts. 

It should be borne in mind that in repeated sampling too good a fit 
is just as likely as too bad a fit. When the computed chi-square value is 
too close to zero, we should suspect the possibility that the two sets of 
frequencies have been manipulated in order to force them to agree and, 
therefore, the design of our experiment should be thoroughly checked.t 

3. ү? test as a test of Homogeneityt. The y^ test of homogeneity is 
an extension of the chi-square test of independence. In both cases we are 
concerned with cross-classified data. The same testing statistic used for 
tests of independence is used for tests of homogeneity. These two types of 
tests are, however, different in a number of ways. First, they are 


* Actually the y? test is a test of badness of fit, since the result of the test leads 
the statistician to conclude either that the fit of a normal distribution to the observed 
distribution is bad or that the evidence that it is bad is not convincing and, therefore, 
it may be said to be ““good”.—Griffin, p. 263. 

1 Chou : Statistical Analysis, p. 457. ` 


itTests of homogeneity are designed to determine whether two or more inde- 
pendent samples are drawn from the same population or from different popu'ations. 
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associated with different kinds of problems. Tests of independence are 
concerned with the problem of whether one attribute is independent of 
another, while tests of homogeneity are concerned with whether different 
samples come from the same population. Secondly, the former involves a 
single sample taken from one population, but the latter involves two ог 
more independent samples one from each of the possible populations in 
question. 

The following examples will illustrate the application of y? test. 

Illustration 1. The table given below shows the data obtained during an 
epidemic of cholera : 


| 
Attacked Not attacked | Total 
Inoculated 31 469 | 500 
Not inoculated . A85 1,315 | 1,500 
Total 216 1,784 | 2,000 
| 


————MÁ————M eee ok See eee y ilr qe 
Test the * Tectiveness of inozulation in preventing (Не attack of cholera. 
(M. Com. Gorakhpur, 1973) 
Solution, Let us take the hypothesis that inoculation is not effective in pre- 
venting the attack of cholera, i.e., inoculation and attack are independent. On the 
basis of this hypothesis the expezted frequency corresponding to the number of persons 
inoculated and attacked would be 


Expectation of (4B) = ША Qm 
' [Attack is denoted by 4 and inoculation by B) 
(4)* = 216, (B) ==500, N=2,000, 
Expectation of (AB) ~ FSR S00 _ 54 


Я 
Having calculated опг value, we can find out other values as follows : 


Attacked | : Not attacked | Total 
! 

— TET: ~ DE 

Inoculated | 54 | 446 | 500 

Not Inoculated | 162 | 1,338 | 1,500 
үөү; EAR. TES X M 

Total | 216 | 1,784 | 2,000 
iu о> mi a a 43123 MU ОСЛОНЕ LITT — 


Now put the observed and expszted frequencies side by side. 


lex Lam uA EI e S UT крест E m 
о Е (0-Е) (0-Е) ‘og 
31 54 23 529 7 
185 162 23 529 5266 
469 6 2 529 1186 
1315 1,338 23 529 0395 
(О—Е)? 


Fg = 14642 


—— 
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— Е)? 
(XE uen 


=k 
v=(r—1)(e—1) 
r=2, c=2 
у=(2—1)(2—1)=1 
For v=1, 2*o.05—3'84 
The calculated value of у? is higher than the table value and hence the resu' t of 


riment does not support the hypothesis. We, therefore, conclude that nocu- 
lation is effective in preventing the attack of cholera. 


lüustration 2. Based on information on 500 randomly selected fields about 
the tenancy status of the cultivators of these fields and use of fertilizers collected in an 
agro-economic enquiry, the following classification was noted : 


Owned Rented 
Using fertilizer 208 . 92, 
Not using fertilizer 32 168 
Would you conclude that owner-cultivators are mor: inclined towards the use of 
fertilizers ? (M.A., Econ, Delhi, 1972) 


Solution, Let us take the hypothesis that there is no association between the 
tenancy status of the cultivator and the use of fertilizers. Ол the basis of this hypo- 
thesis the expectation for fields owned and using fertilizers is : 

(AYX 

Expectation of (4B) e 409 
where А denotes owned field. 

B denotes fields using fertilizer. 

240x300 _ 


Expectation of (AB)= 500 =144. 


The table of expected frequencies is : 


| 
| 144 156 300 | 
| 96 104 200 | 
| 240 260 | 500 “| 
| 


Applying the у? test. 


E. (O—E)* 
о Е (O—E) E 
208. —— 144 4,096 28:444 
32 96 4.096 42:667 
92 156 4,096 26-256 
168 104 4,096 39°385 
bo ESL ШУ е. 
gO E „|з6752 


E 


ОЕ) 
=z E =136°752 
v=(r—1)(c—1)=(2—1) (2-1) = 1. 
table y For v=1 77o.03=3'84. The calculated value of Y? is much higher than the 
alue and hence the result of the experiment does not support ihe hypothesis. 
оа оге, conclude that the owner-cultivators are more inclined towards the use 


When the alt i ЫГ i 
proceed as аа a method of calculating the value of x^ is applied, we shall 
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т = (ad—bc)*.N JK 
(a+c) (b+) (c+d) (a+b) 
a=203, d=168, b=92, c=32, N=500. 


Here 
Substituting these values, 
3: (208 х168—92Х32)2—500 —— — 
X = — 2083-32) (92-- 168) (32+ 168) (2084-92) 
_ (34944 2944)* x 500 
(240) (260) (200) (300) 
= 512000 . n 
= 3744 13675 
Thus we find that the answer is the same as obtained above. 
Illustration 3. In an experiment on immunization of cattle from tuberculosis, 
the following results were obtained : 


Affected Not affected 
Inoculated 12 26 
Not inoculated 16 и 6 


Calculate у? and discuss the effect of vaccine in controlling susceptibility to 
tuberculosis (5% value of x? for one degree of freedom—3'84). 
(M. Com. Meerut, 1971) 


Solution. Let us take the hypothesis that vaccine is not effective in con- 
trolling susceptibility to tuberculosis. Applying X“ test: 


Expectation of (AB) ЗАА 


= 38 хов] 
б X28=177 


The table of expected frequencies will be as follows : 


177 203 $3 | 

| 
103 11°7 22 | 
28 2 | 6 | 


Since one of the observed frequencies is less than 10 we will use Yate's corrections and 
then apply X^ test. 


(О—Е)? (0—Е)?/Е 
27:04 153 is 
27°04 2:63 
2704 , 133 
27°04 231 
EET. Lgs 
OVE —тз 


y=(r—1)(c—1)=(2—1)(2—1)=1 
For v= 13 770 05=3'84 


Since the calculated value of x? is greater than the table value the hypothesis is not 
true. We. therefore, conclude that vaccine is effective in controlling susceptibility to 


tuberculosis. 
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Illustration 4. Examine by any suitable method, whether the nature of area is 
related to voting preference in the election for which the data are tabulated below. 


Votes for 
A B Total 
Area 
Rural 620 480 anpi 
Urban 380 520 900 
Total 1,000 p dm 


уз at 5% level for v=1=3°84) (M. Com., Delhi, 1973) 
Solution, Let ustakethe hypothesis that the nature of area is not related to 
the voting preference in the election, i.e., the two attributes are independent. 


The expected frequency of votes for the candidate А in rural area 
1100 
- 1000— 
x 1000=550 


We can find out the other cell frequencies by putting this value in 2X2 table. 


550 550 1100 


о Е (0—E)* (O—E)']E 

620 550 4,900 89 

380 550 4,900 109 

480 550 4,900 89 

ы 450 - 4,900 10'9 
3(O—E)*/E=39'6 

(О—Е)? 
2 = 
st E —39'6 
E pu 
or v=1, 370057384. 

thesi The calculated value of Z*is greater than the table value and hence our hypo- 
id n does not hold good. We, therefore, conclude that the nature of area is related 


voting preference in the election. 


poen Mlustration 5. A certain drug is claimed to be effective in curing colds. In an 
them. ment on 164 people with colds, half of them were given the drug and half of 
Siven sugar pills. The patients’ reactions to the treatment are recorded in the 


following tabl i i i 
curing calde e. Test the hypothesis that the drug is no better than sugar pills for 
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Helped Harmed No effect 
Drug 52 10 20 
Sugar pills 44 12 26 


(M. Com., Calcutta, 1969 ; M.B.A., Delhi, 1974 ; 
M. Com. Punjabi Univsrsity 1975) 


Solution, Our hypothesis is that the drug is no better than sugar pills for 
curing colds. Let us find out the expected frequencies and then apply the у> test. 


Helped | Harmed | No effect | Total 
|52 10 20 82 
ка ҮЗҮ ау A (e) 
Sugar 44 12 26 82 
pills (b) (d) (f) 
Total 96 | 22 | 46 164 
тай 


Expected frequency corresponding to (a) 


164 
Expected frequency corresponding to (c) 
SAU sedis 
RT 6452221 1. 
Expected frequency corresponding to (e) 
82 


ЖЕГУ, х46=23. 


The table of expected frequencies is : 


48 11 23 82 
(a) (c) | (e) | 
48 | | 


u 23 | 
(6) (d) IT 129182 


96 22 46 | 164 
Applying the x? test. 
— (M ir Ue EM EE ee 
o E (O—E)* (0—E)?]E 
52 48 16 TTE 
44 48 16 107330 
10 n 1 0 091 
12 11 1 0:091 
20 23 9 0391 
26 25 9 07391 


2(O—E)*/E=1'624 
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MER 
й#=5—————=—гв4 


у==(7—1)(с—1)=(2—1)(3—1)=2. 
For v=2, %7o.05=5'991. 
The calculated value of у? is less than the table value and hence the result of 


the experiment does not provide any evidence against the hypothesis. Therefore, drug 
is no better than sugar pills in curing colds. 


S Illustration 6. Two investigators draw samples from the same town in order to 

estimate the number of persons falling in the income group "poor," “middle class,” 

well-to-do.” (The limits of the groups are defined in terms of money and аге the 
same for both investigators). Their results are as follows :— 


Income-group 
Investigator. 
Poor Middle-class | — Well-to-do Total 
A 140 | 100 | 15 255 
(а) | (Ь) (с) 
В 140 | 50 | 20 210 
y : (d) | (е) (f) А кар 
Total 280 i 150 ! 35 465 


Show that the sampling technique of at least one of the investigators is suspect. 


Solution, Let us take the hypothesis that the samples are drawn at random by 
the investigators or there is no suspicion about the sampling technique of the two 
investigators. On the basis of this hypothesis, the expected frequencies corresponding 
to (a) and (5) are: 

Expectation for lane х280=153` 

© . 465 
Expectation for (b)= Tat х150=82 3 
465 


The table of expected frequencies is given below : 


1536 82.3 191 255 | 
1264 677 159 210 | 
| 
| 
280 150 35 465° | 
Applying the x? test. " 
Ё (О—Е)? 
M o E (0—E) = 
140 153°6 184796 1204 
140 1264 184:96 1:463 
100 823 31329 3:807 
50 677 31329 4628 
15 191 1681 0:880 
20 15:9 168i 1057 
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LEM 
=з OB" _13039 
у= (7—1) (с—1) 
=(2—1)(3—1)=2 
For v—2, x? 4.,5— 5:991. 


The calculated value of x? is greater that the table value and thus the result of 
the experiment does not support the hypothesis. 


Illustration 7. Three hundred digits were chosen at random from a set of 
tables. The frequencies of the digits were as follows : 


Digit 0 1 2 3 4 5 6 d. 8 9 
Frequency 28 29 33 31 26/1235" 1692-0730; 731 25 


Using the x? test assess the hypothesis that the digits were distributed in equal 
number in the table. (Тһе 5% value of x? for 9 degrees of freedom is 16°92.) 


(M. Com. Gorakhpur, 1974) 
Solution, We have to test the hypothesis that the digits were distributed in 


3 
equal numbers. On the basis of this hypothesis we should expect SUE 30 as the fre- 


10 
quency for 0, 1, 2......... АА digits, еіс. Applying the x? test. 
Еў? 

о Е (О—Е)? 2 28) 

28 30 4 0:133 

| 29 30 1 0:033 
33 30 9 0'300 

31 30 1 0 033 

26 30 16 0'533 

35 30 25 0:833 

32 30 4 0133 

30 30 0 0:000 

31 30 1 0'033 

25 30 25 0 833 

—F\2 
Total 300 300 2 OE 2864 
2 .£.(0—E? .. 
=k -p 72864 
Degrees of freedom=10—1=9. 
For »—9, 35,.45—16:92. 


The calculated value of Pe is less than the table value and hence our hypothesis 
that the digits were distributed in equal numbers holds good. 


, . Illustration 8. The figures given below are (a) the theoretical frequencies of a 
distribution, and (6) the frequencies of the normal distribution having the same mean, 
standard deviation and the total frequency as in (а). 

(a) 1 5 20 28 42 22 15 5 2 

(5) 1 6 18 25 40 25 18 6 d 
Apply the x? test of goodness of fit. 
Solution. Since the observed and expected frequencies are less than 10 in the 


besinning and end of the series, we shall group these classes together and then apply the 


—T ау С: 
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(О—Е)? 
LES Pp ies i m 

о Е (O—E) = 
1 1 0:143 
5 6 1 0:222 
20 18 4 0:360 
28 25 9 0:100 
42 40 4 0:360 
22 25 9 0:500 
15 18 9 0:000 

0 


2 
g-r0-P ves 
Here v—4, This is because the number of degrees of freedom is one for each 
class, less than one for each "restraint." The 9 original classes have been reduced to 
7 by grouping, thus reducing the degrees of freedom by 2. In addition the mean 
standard deviation and total frequency of the original distribution have been used in 
calculating the theoretical frequencies, thus introducing three restraints. The number 
of degrees of freedom is accordingly 4. 
For v=4, 7%» 9s=9'49. The calculated value of x? is much less than the table 
value and hence the fit is good- 
Illustration 9. The following mistakes per page were observed in a book. 
he ea 
No. of mistakes 
per page No. of pages 
211 
90 
19 
5 
0 


ў Total 325 
Fit a Poisson distribution and test the goodness of fit. 


шю о 


(M.A., Econ., Delhi, 1972) 


. Solution. The expected frequencies of the Poisson distribution have been 
шч in Illustration 12 of the Chapter on Theoretical Distribution. These frequen- 
were : 


x 0 1 2 3 4 
fe 209°43 9215 20°27 2:97 0733 
The goodness of fit can be tested by applying the X? test. 
о Е (0—E)*. EUER 
E 
211 209-43 246 0:012 
90 9215 462 0"050 
19) 2027 
5р 2°97 018 07008 
9J 0733 2 
= Ic 
3/9 — E 0070 


E 
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у=3—2=1 

Рог у›=1,у%ө.05=3`84 

The calculated value of 7% is less than the table value and hence the fit is good. 
i Wustration 10. In an experiment on peabreeding, the following frequencies of 
seeds were obtained: 316 round and yellow, 102 wrinkled and yellow, 109 round and 
green, 33 wrinkled and green ; total 560. Theory predicts that the frequencies shouid 
bein the proprtions 9:3:3:1. Use y?—test to examine correspondence between 
theory and experiment. (M. Com., Meerut, 1975) 
Solution. Let us take the hypothesis that the difference in the observed and 
expected frequencies is not significant. On the theoretica| consideration: out of 560 
5 < 960x3 

the expected frequencies should be 2609 315, AX =105, ~= 16 


E =35 respectively. Applying x? test. 


= 105, 


о Е (0-Е)? (0-Е)Е 
316 315 1 07003 
102 105 9 07086 
109 105 16 0152 
33 35 5 0114 
Ке ERN АА bou. s LCS QT BN: 
iA 07355 


Degrees of freedom=3 


For v=3. у10.05=7:82. The calculated value of x? is much less compared to 
the table value. Hence there is no reason to doubt the hypothesis. There is corres- 
pondence between theory and experiment. 


Additive Property of у? 


One of the merits of y? test as an instrument of research is that it is 
possible to combine the independently derived values of x? relating to 
. Samples of similar data by the simple process of addition. It enables a 
better (because more comprehensive) test than could be made using the 
data of any one sample by itself. The sum of the y* value thus combined 
will itself have а X? distribution with degrees of freedom equal to the sum 
of the degrees of freedom of the separate 7 values. However, while add- 
ing to 7? values two points must be remembered : 
1. The combining results in a single inclusive test is appropriate 
when the samples are independent ; and 


2. When у* values are to be added Yate's corrections should not be 


applied because the addition theorem holds only for uncorrected constitu- 
ent items. 


The following example will illustrate the additive property : 


wae Hiustration 11. Thefoilowing values of xt 
districts about inoculation and attack from cholera. 
for the four districts taken together ? 


are obtained in a survey of our 
What conclusion would you draw 


Districts pura x 
A 1 362 
B 1 348 
€ 1 205 
р 1 543 б 
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Solution. In case of first three districts since the value of xlisless than 338, 
the results of the tests are not significant. However, in case of district D the result is- 
significant at 5% level. But when the combined test is carried out the sum 1458 is 
tested with v=4. For 4 degrees of freedom 770:05=9'488. The combined value of x? 
(14°58) is higher than the table value (9488) and hence Ше combined result does not 
support the hypothesis. : 


MISCELLANEOUS ILLUSTRATIONS 


Illustration 12. А sample analysis of examination results of 500 students was- 
made. It was found that 220 students had failed, 170 had secured a third class, 90 were 
placed in second class and 20 got a first class, Are these figures commensurate with the 
general examination result which is the ratio of 4:3: 2:1 for the various categories 
respectively (the table value of х? for 3 d.f. at 5% level of significance is 781)? 

(M. B.A., Delhi, 1972; 

Solution, We have to test the hypothesis that the observed results are commen- 
surate with the general examination result which is the ratio of 4:3: 2:1. The expect- 
ed number of students who have failed, obiained a third class, second class, 
and first class respectively will be 5004 == 200, Ed = 150, 50052 = 100, 
<00х1 
oe 


10 50. 


Applying у? test: 


о Е 
"s 220 200 
170 150 
90 100 
20 50 
For у=4—1=3; x70.05=7 81 


Since the calculated value of x? is greater than the table value our hypothesis 
does not hold good. 
, Mustration 13, A certain drug was administrered to 456 males out of a total 
720 in a certain locality to test. its efficacy against typhoid. The incidence of typhoid) 
is shown below. Find out the effectiveness of the drug against the disease. 


Infection No Infection Total 
Administering the 
drug 144 312 456 
Without administer- 
ing the drug 192 72 264 
Total 336 384 70 . 


(M. Com., Raj., 1974) 
Wisi een. Let us take the hypothesis that the drug is not effective in checking: 


Expectation of (48)= (P 


456 


77220 X336=212'8 or 213 
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The table of expected frequencies is given below : 


213 243 456 

123 її | 264 

336 384 720 
(O—B) 

m 2 
о Е (0—E) = 

144 213 4161 22:352 
192 123 4761 38707 
312 243 4761 19:502 
72 141 4761 33766 


NS F2 
es OTE” „т 


y—(r—1) (c—1) 
(2-1) (2—1)=1. 
For v=1, Х%.5=3`84 
The calculated value of x* is much greater than the table value and hence the 
hypothesis is rejected. We, therefore, conclude that the drug is effective in preventing 
typhoid. 
Illustration 14. 4 coins were tossed 160 times and the following results were 
obtained : 


No. of heads & 0 1 2 3 4 

Observed frequencies : 17 52 54 31 6 

Under the assumption that coins are balanced find the expected frequencies of 
:getting 0, 1, 2, 3, or 4 heads and test the goodness of fit. (M. Com., Delhi, 1973) 


Solution. On the assumption that the coins are balanced, the expected 
(frequencies of getting 0, 1, 2, 3, and 4 heads will be given by the expansion 


1 TAS 
ie C 3] 
1 1 3 1 1 
=] 
=10, 40, 60, 40, 10. 


Applying x? test : 

Observed Expected r- 

No. of heads frequencies frequencies (O—E) (o Er 
о Е; 

0 17 10 49 4'900 

1 52 40 144 3600 

2 54 60 36 0:600 

3 31 40 81 2:025 

4 6 10 16 1'600 
sD s 
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v=5-1=4 
For ›=4, x*o.05=9"49. 
The calculated value of x? is greater than the table value and hence the fit is 
poor. 
Illustration 15. 1,600 families were selected at random in a city to test the belief 
that high income families usually send their children to public schools and low-income 


families often send their children to government schools. The following results were 
obtained : 


School | 
Income | Total 

Public | Government | 

Low 494 | 506 | 1,000 

High 162 | 438 | 600 
| 
| 

Total 656 | 944 | 1,600 

Test whether income and type of school are independent. (M. Com., Delhi, 1975) 


Solution. We take the hypothesis that there is no association between income 
and type of school. 


Expectation of (AB)= exu. 
656x1,000 _ 
$3600 1 
The table of expected frequencies is given below : 
| School 
Income етә Total 
Public Government 
Low 410 590 1,000 
High | 246 354 600, 
| 
Total 656. 944 1,600 
as TEN 2 90 GL EE E 18 cud 
о Е (0-Е) eS 
TTE at E 
494 410 7,056 17:210 
162 246 7,056 28'683 
506 590 7,056 11:959 
438 354 7,056 19:932 
[OE E 
5 E =77784 
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v (7—1) (c—1) 
=(2-1) (2—1)=1. 

The table value of у? for 1 degree of freedom at 1% level—6'64. The calculated 
value of x? is much higher than the table value and hence the hypothesis stands rejected. 
We, therefore, conclude that there is association. between family. income and type of 
schooling. ; 

Illustration 16. A movie producer is bringing ош a new movie. In order to 
map out his advertising campaign, he wants to determine whether the movie will appeal 
most to a particular age group or whether it will appeal equally to all age groups. The 
producer takes a random sample from persons attending a preview of the new movie, 
and obtain the following results: 


Age Groups 
Under 20 20—39 40—59 60 & оуег Total 
Liked the movie 146 (a) 78 (d) 48 (2) 28 (j) 300 
Disliked the movie 54 (b) 22 (e) 42 (h) 22 (k) 140 
Indifferent 20 (c) 10 (f) 10 (i) 20 (0 60 
Total 220 110 100 70 500 


‘What inference will you draw from this data ? 
Solution, Let us take the hypothesis that the movie appeals equally to all age 
groups. 
BD NS) з aoo siana 
Expectation of (a) = 500 х220=132 
140 


Expectation of (b) = 509 (220—616 


300 


Expectation of (d) 7509 11066 


Expectation of (e) -19 x110—30'8 


Expectation of (8) = 200. X100—60 


Expectation of (Л) =ч x100—28 


The expected frequencies can be tabulated below : 


| 
132 | 66 | 60 |42 300 
| 


61'6 308 28 19'6 140 


264 132 | 12 84 60 


22€ 110 100 70 500 
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Applying у* test : 


о Е (0) (O-E)4E 
146 132°0 19600 1484 

54 61:6 57-16 9:938 

20 264 40°96 1552 

78 660 14400 2182 

22 308 7744 2:514 

10 132 10:24 07776 

48 60:0 144°00 2: 

42 280 196-00 7000 

10 120 4 0:333 

28 420 196:00 4:667 

22 19:6 5°76 0:294 

20 84 134:56 16019 

2 
3 (0—EY _ so 159 


v=(r—1) (c—1)=(3-1) (4-1)=6 
For ¥=6, x°9.05=12'59 


The calculated value of 7* is much greater than the table value. The hypothesis 
does not hold good and we, therefore, conclude that the movie does not appeal equally 
to all age groups. 


Mlustration 17. From the adult male population of seven large cities random 
samples giving 2X7 contingency table of married and unmarried men as given below 
were taken. Can it be said that there is a significant variation among the cities in the 
tendency of men to marry ? 


City A B с D E F G Total 
Married 133 164 155 106 153 123 146 980 
Unmarried 36 57 40 37 55 33 36 294 
Total 169 221 195 143 208 156 182 1,274 


[At (2-1) (7- 1) d.f. take у20.05=12:6] 
(B. Com., Gujarat, 1966 ; M.A, Econ., Punjab, 1976) 


., Solution Let us take the hypothesis that there is no significant variation among 
the cities in the tendency of men to marry. 


980 


Expected number of married people in City 4=-1274 X169—130 
Expected number of married people in City B= X221=170 
Expected number of married people in City C= rx 195=150 
Expected number of married “people in City D= EA X143—110 
Expected number of married people in City E- e х208—160 
Expected number of married people in City F= 0 х156=120 


Expected number of married people in City = 080 x182=140 
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The table of expected frequencies is given below : 


130 vo | 150 110 160 120 140 
| 
39 EEO aas 19.33 48 36 | 4 
— L——| ——À, 
169 gr | 19 |. 143 208 156 | 182 
| | ji i | 
Applying x? test : 
о Е (0-Е)? 
jaz Oe) 30) 9 
164 170 36 
155 150 25 
9 106 110 16 
153 160 49 
123 120 9 
146 140 36 
36 39 9 
57 51 36 
40 45 25 
37 33 16 
55 48 49 
33 36 9 
36 42 36 


Forv=6, 2005—12:59 
The calculated value of x is less than the table value. Hence the hypothesis 


holds good. We, therefore, conclude that there is no significant variation among the 
cities in the tendency of men to marry. 


Illustration 18. An automobile manufacturing company is bringing out a new 
model. In order to map out its advertising campaign, it wants to determine whether 
the model will appeal most to a particular age group or equally to all age groups. The 
firm takes a random sample from persons attending a preview of the new model and 
obtains the results summarized below : 


Age Groups 
Under 20 20 —39 40-59 60 & over Total 
Persons who : 
Liked the car 200 70 60 70 400 
Disliked the car 50 30 20 50 150 
Total 250 100 80 120 550 


Can it be concluded that the new model appeals equally to all age groups ? 
So ution. Let us take the hypothesis thatfthe new model appeals equally to all 
age groups. 
400 


Expecte frequency corres ponding to first row Ist column =-ссу x250—182 
Expected frequency corresponding to first row 2nd column= 10. х100=73 


Expected frequency corresponding to first row 3rd column— E x80—58 
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The table of expected frequencies will be as follows : 


| 182 ON) age ots | 87 | 400 
| 
TT 
} | 
| e qom pom | 33 150 
| | » 
| | 
| 250 | 100 | ю | 120 | 5% | 
[mE E | 
о Е (0-Е)? (O-E)*/E 
200 182 324 1.780 
50 68 324 4:764 
70 73 9 0:123 
30 27 9 0:333 
60 58 4 0069 
20 22 4 0182 1 
70 87 289 3:322 
50 33 289 87758 
ey nis Eg 
OE. —19331 


pi 
iex OE озу 
y—(r—1) (с—1)=(4—1)=3 
For у=3, x79.05=7°815 
The calculated value of x? is greater than the table value. Hence the hypothesis 


oes not hold true. We, therefore, conclude that the new model does not appeal equally 
to all age groups. 


Illustration 19. A set of 5 coins is tossed 3,200 times, and the number of heade 
appearing each time is noted. The results are given below: 


No. of heads 0 1 2 3 4 5 
Frequency 80 570 1100 900 500 50 
Test the hypothesis that the coins are unbiased. (M. Com., Meerut, 1973) 


Solution. Let us take the hypothesis that the coins are unbiased. If this is true, 
the Chances of getting heads, etc. in a toss of 5coins are the successive terms in the 
binomial (41--3)5. So the theoretical frequencies in 3,200 tosses are the terms in the 
expansion 3200 (1--1)5 and are as follows : 


No. of heads 0 1 2 3 4 5 
Frequencies 100 500 1000 1000 500 100 
Applying у? test : 
[^] E (О—Е)? (O-E)*JE 
80 100 400 400 
570 500 4900 9°80 
1100 1000 10000 10:00 
900 1000 10000 10°00 
500 500 0 0°00 
50 100 2500 25°00 
(0—E) " 
х Е 58'80 


SM-A—11°77-57 
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(0—Е)? 
Е 
Degrees of freedom=5 

For v=5, 7%0-05=11`07 

The calculated value ofz*is much greater than the table value. Hence the 
hypothesis is rejected. We, therefore, conclude that the coins are biased. 

Illustration 20. Do the following figures provide evidence of the effectiveness 
of inoculation ? 


х=? =53'8 


Attacked Not attacked Total 
Inoculated 20 300 320 
Not inoculated 80 600 680 
Total 100 900 


1000 
(M. Com., Meerut, 1974) 


. Solution. Let us take the hypothesis that inoculation is not effective in pre- 
venting attack. Applying Y? test : 


Expectation of (48)—- 4X B 


100x320 _ 
1000 7—22 


The table of expected frequencies is given below : 


32 288 320 | 
IET? Gere m een 
68 612 680 | 
100 900 1000 | 
[2] Е (0—Е)* (O—EY']E 
20 32 144 500 
80 68 144 2117 
300 288 144 0:500 
600 612 144 0:235 
(0-Е)? .. 
51352 


v=(r—1)(c—1)=(2—1) (2-1) =1 
For у=1, xo 05=3'84 
The calculated value of x? is more than the table value. Hence the hypothesis 
does not hold good. We, therefore, conclude that inoculation is effective. Ep 


Tllustration 21. From the adult male population of four large cities, random 
samples of sizes given below were taken and the number of married and single men 
recorded. Do the data indicate any significant variation among the cities in the ten- 
dency of men to marry ? 


City A B c D Total 
Married 137 (a) 164 (c) 152 (e) 147 (g) 600 
Single 32 (b) 57 (d) 56 (f) 35 (В) 180 
Total 169 221 208 182 780 


(M. Com. Meerut, 1975) 
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Solution. Letus take the hypothesis that there is no variation among the 
cities in the tendency of men to marry. Applying Y^ test, 
The expected frequencies are : 
600. 
780 
Corresponding to (b)— $00 551—170 
780 
600 


Corresponding to (c)— 77807 (208—160 


Corresponding to (a)— 


X169—130 


Corresponding to (4) ex 182—140 


The table of expected frequencies is : 


130 | 170 160 | 140 600 
39 51 | 48 | 42 180 
169 221 | 208 182 780 
о Е, (O—E)? (O-E)*/E 
137 130 49 0°378 
32 39 49 1256 
164 170 36 07212 
57 51 36 0:706 
152 160 64 0400 
56 48 64 1:333 
147 140 49 0:350 
35 42 49 1167 
(0—E*. .. 
= E =5'802 
v=(r—1)(c—1)=(2—1) (4—1) =3 
For ¥=3, 35,05 7:82 


Ww. The calculated value of X? is less than the table value. The hypothesis holds true. 
Ve, therefore, conclude that the data do not indicate significant variation among the 
Cities in the tendency of men to marry. 


d _Ulustration 22, Verify whether Poisson distribution can be assumed from the 
ata given below : 


No. of defects : 0 1 2 3 4 5 
h ; 6 13 13 8 4 3 
fü : 624 13:52 13°52 9°01 450 1:80 
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Solution. Applying x* test 


—— 


No. of defects о Е (0-Е)? (O—E)!E 
0 6 624 070576 0:009 
1 13 1352 02704 0.020 
2 13 13:52 02704 0:020 
3 8 901 1 0201 0113 
4 4 450 0 2500 07056 
5 3 1°80 1°4400 0:800 
Ё у т BONES LES... 3 
z E =1'018 | 
pag у=п—1=6—2=4 


For у=4,у%6%=9'49 
"The calculated value of xê is much less than the table value. Hence the Poisson 
distribution can be assumed. 


Hlustration 23. The number of books borrowed from a public library during a 
particular weck is given below. Test the hypothesis that the number of books borrowed 
does not depend on the day of the week. 


Days No. of books borrowed 
Monday 140 
Tuesday 132 
Wednesday 160 
‘Thursday 148 
Friday 134 
Saturday 150 


Solution. Let us take the hypothesis that the tendency to borrow books is 
independent of the day of the week. 


‘The number of books borrowed during six days-864. We should expect 
284-144 books to be borrowed on each day of the week. Applying 7? test : 


ч Days К о J E (0-Е)? (O-E)*/E 
Monday 140 144 16 0111 

132 144 144 1'000 
Wednesday 160 144 256 1778 
Thursday 148 144 16 0111 
Friday 134 144 100 0:694 
Saturday 150 144 36 0:250 

(O—E)* g 1 
\ Я. УЕ 3:944 
y-6—1-5 » 
For v=5, 3'«a—1107 


The calculated value of y? is less than the table value and hence the hy; esis is 
true. We, therefore, conclude that the number of books b 
ev e x ol orrowed does not depend on 


Mlustration 24. Тһе following table shows the number of people interviewed bY 
age groups and the number in each age group estimated to have рено ulcer : 


Age Group 20—25 25—30 30—35 35—40 40—45 
Nos. Interviewed 50 100 200 350 400 
P.U. Cases 5 12 25 28 

Age Group 45—50 50—55 55—60 T 

Nos. Interviewed 250 150 100 Тош! 

P.U. Cases 27 12 11 160 


Do the data reveal an association between age groups, and peptic ulcer ? 


у? TEST AND GOODNESS OF FIT A426 


Solution. Let us take the hypothesis that peptic ulcer is not associated with age 
groups, i.e., the two attributes are independent. 


2 Age groups Observed cases Expected cases 
20—25 5 5 
25—30 12 10 
30—35 25 20 
35—40 28 35 
40—45 40 40 
45—50 27 25 
50—55 12 15 
55—60 H 10 
Applying x? test : 
(0-Е)? 
ga с рл 
2 E 
о Е (О—Е)* (O—E)*]E 
5 5 Cg: 
54 0) 4 0267 5 
25 20 25 1250 
28 35 49 1'400 
40 40 0 0'000 
27 25 4 0160 
12 15 9 0'600 
11 10 1 0'100 _ w 
СЕЎ 
z2- mT 
у=8—1—1=6 
Рог у=б, 1550571259 


The calculated value of 7? is less than the table value. Hence the hypothesis is 
true. We, therefore, conclude that peptic ulcer is not associated with age groups. 

Illustration 25. You are given a sample of 150 observations classified by two 
Attributes A and В as follows : 


^ Az As Total 

By 40 25 15 80 

Bs 11 26 8 45 

B; 9 9 1 25 
Total 60 60 30 150 


Use the x? test to examine whether A and В are associated. 
(M. A. Econ., Patiala, 1975) 


Solution. Let us take the hypothesis that A and В are not associated. 
Expected frequency corresponding to 
(8019; х 60=32 
Expected frequency corresponding to 
80 
(AgBy) — 150 *00=32 
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Expected frequency corresponding to 
45 $ 
(41 B) = 150 x 60=18 


Expected frequency corresponding to 
(43B3) => x 60=18 


150 
The table of expected frequencies would be 
Ay Ag Аз Total 
i 
В, | 32 S ES | 80 
Fa | 
В, 18 {ИЛ Ө ipu б 
| 
| 0 
в| 10 10 | 5 25 
| ee | 
| | | 
Total 60 оО ue! toon as Co 
— | l 
Applying у? test : 7 
O—E)* 
2. Rar J 
x-23-——E E 
oO E (O—E)* (O—E)*/E 
40 32 64 27000 
п 18 49 2722 
9 10 1 0100 
25 32 49 1:531 
26 18 64 3:556 
9 10 1 0:100 
15 16 i 0:062 
8 9 1 0111 
7 5 4 0`800 
OSE} T. 
> —E —10 982 


y—(r—1)(c-1)—(3-1)(3—1)—4 
For у=4, Y*o.05—9'49 


. The calculated value of y? is greater than the table value. Hence the hypothesis 
is rejected and we conclude that 4 and В are associated. 


_ lllustration 26. A survey of 320 families with 5 children each revealed the 
following distribution : 


No. of boys 5 4 3 2 1 0 
No. of girls 0 1 2 3 4 5 
No. of families 14 56 110 88 40 12 


Is this result consistent with the hypothesis that male and female births are 
equally probable. You may use the following table giving the values of Chi-square. 


(M. Com., Deihi, 1975) 
Solution. On the hypothesis that the male and female births are equally pro- 


able the expected number of families would be obtained by ihe expansion of 


(2+2). 
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у E ула з ooo) 
320( 3 +-у) —з2%(зу. 32° 732* `32* 30» 730 
—10, 50, 100, 100, 50, 10 

Applying х? test : 


о Е (0—Е)? (O—E)E 
14 10 16 1'60 

56 50 36 0°72 
110 100 100 1:00 

88 100 144 144 

42 50 100 200 

12 10 4 0:40 

(O—E)* 
a > Е 7°16 
y-6—1-—5 
For у=5, 7%-о5=11'07 


The calculated value of x? is less than the table value and hence the hypothesis 
holds good. We, therefore, conclude that the results show that male and female births _ 


are equally probable. MIS 
Illustration 27. The contingency tab!e below summarizes the results obtained in 


a study conducted by a research organisation, with respect to the performance of four 
competing brands of toothpaste among the users ; 


| 
Вгапа А | Brand В | Brand C | Brand D Total 


No Cavities 9 13 17 11 50 
One to five | | 
Cavities 63. Me (520 85 — 82 300 
More than five | | 
Cavities 28 | 37 48 37 130 
|- VEL LUAM, 4 
E | 
Total 100 | 120 150 130 500 
settee ECT ГЕ. Ea o Ma nter АЙ fa 


Test the hypothesis that incidence of cavities is independent of the brand of the 
toothpaste used. (The table value of y? for 6 d.f. are 12:59 and 16°81 at 5 per cent and 
1 per cent levels of significance respectively). (M. A. Sociology, Bombay, 1976) 

Solution. Our hypothesis is that incidence of cavities is independent of the 
brank of the toothpaste used. Applying 7? test. 


Expected frequency corresponding to first row first column =o. х100=10 
_ 300 i 
» » a » second e » =300 x100—60 
2230 zi 
» » » » first row second „  —-sog X 120=12 
» » as * „ second 1 3; — x120—72 
i 50 
» » » » first row third „ =-500 * 150=15 
= 300 


»» second B PST x150—90 
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The complete tabie of expected frequencies shall be ; 


о Е (O-E)? (0—E PIE 
9 10 1 07100 

63 60 9 0150 

28 30 4 0133 

13 12 1 0083 

70 72 4 0 056 

37 36 1 0:028 

17 15 4 0:267 

85 90 25 0:278 

48 45 9 0:200 

11 13 4 0:308 

82 78 16 0:205 

37 39 4 0°103 г 

2 


v=(r—1)(C—1)=(3—1) (4-1) =6 


For v—6 ;4,—1259. The calculated value of у is less than the table value. 
Our hypothesis holds true. Hence, incidence of cavities is independent of the brand of 
the toothpaste used. 


Illustration, 28. Out of a sample of 120 persons in a village, 76 persons were 
administered a new drug for preventive influenza and out of them, 24 persons were 
attacked by influenza. Out of those who were administered the new drug, 12 persons 

Prepare ; 

(a) 2x2 table showing actual and expected frequencies, 


cc (5) Use Chi-square test for finding out whether the new drug is effective or 
a 


(At 5% level for one degree of fr. the value of chi- is 3°84. 
УА пе degree of freedom the value of chi-square is eA, 1979) 


Solution, The given information can be tabulate as follows. 


Attacked not attacked Total 
Drug 24 52 76 
No drug. 32 12 44 
Total 56 64 120 


ie Let us take the Бурын als that the new drug is not effective in preventing 
influenza. Oa the basis o this hypothesis the expectation of (AB) i.e. number of 


persons given drug and attacked shall be 4 X55—3547, The other expected fre- 
quencies can be now easily obtained. 
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3547 | 4053 | 76 


72053 | 2347 | 44 


64 |120 | 
Applying x? test 
[^ E (O—EY. (O—E)*/E 
24 35°47 13156 37709 
32 20:53 13156 67408 
52 40:53 13156 3/246 
12 2347 13156 5:605 
3 (O—E)*/E=18'968 


№=5 (0—EY 15:968 
у=(т—1)(с—1)—(2—1)(2—1)=1 


For v=1 x% os=3'84 
Y The calculated value of 72 is greater than the table value. The hypothesis is 
rejected we therefore conclude that the new drug is effective in preventing influenza. 
Illustration 29, The table below shows the relation between performances of the 

Test the hypothesis that there is no 


same students in Accountancy and Statistics. 
correlation between the accountancy and statistics performances. 


Accountancy 


hub. t me ШЕР 
Grades 


Statistics 


Ж 
Solution. We have to test the hypothesis that there is no correlation between 


accountancy and statistics performances. Apply Х test. 
Expected frequency corresponding to first row first column = i x100—32:50 


215 
» » » »198d 4» 3 7 400. =53° 
2n: 400 x104—53775 
130 
» „2 WA Se —81* 
first nd 400 x250—81:25 


»2nd » » » = x250—13438 


» " P 
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The table of expected frequencies shall be 


3250 | 8125 | 1625 | 130 
5375 | 13438 | 2687 | 215 
| 
1375 | 3437 | 688 55 
100 253 | 50 | 400 
o E (О—Е)? (O—EY|E 
50 32:50 306:25 9:423 
40 53°75 189-06 3517 
10 13:75 14°06 1:03 
70 81°25 120°56 1:558 
. 160 134-38 656738 2885 
20 34:37 20649 6:008 
10 16:25 39:06 2:404 
15 26:87 140:89 5:243 
25 6'88 32833 es) 2. 
ZOE! _ 95.983 


E 


v=(r—1)(e—1)=3—-NB—1)=4 
For у=4 425.95=9°49 1 
The calculated valve of х? is much greater than the table value. The hypothesi 
is rejected. We conclude that there is correlation between performance in statis 
and accountancy. К 
Illustration 30. Two researcher adopted different sampling techniques while 


investigating the same group of students to find the number of students falling in 
different intelligence level. The results are as follows : 


No, of students in each level 


Researcher Below average Average Above average Genius Total 
X ET TOU NET 10 200 

Y 40 33 25 2 100 
Tota] 126 93 69 12 300 


would you say that the sampling techniques adopted by the two researchers are signi- 
ficantly different ? [For v—3, o05=7°82] а M. Com., Delhi, 1977) 


Solution. Letustake the hypothesis that the sampling techniques adopted by 
the two researchers are not significantly different. Applying 72-test. 


Expected frequency for first row first column =200 у 126=84 


300 
200 
» »„ m o» » Ond „ 1 x93-62 
—200 „о. 
» Uu X cL 


Б „ла уч Tibe E {рш 
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Table of expected frequency 
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84 lags | 8 | 200 | 

ый | | | | 

42 31 | 23 | 4 | 100 | 

ERU | | 

126: | 93. | 69 | 12 300 | 

1 | | | 
РОЛЕ @ PO EST | (O—EP  (O- BIE 

86 84 4 0048 
40 42 4 0:095 
60 62 4 0-065 
33 31 4 0:129 
44 46 4 0:087 
25 23 4 0174 
10 8 4 0°500 
2 4 4 1:000 


С For v=3, 30—782. 


х(0—Е)?[Е=2'098 


The calculated value of y? is less than the table value. Hence, the sampling 
techniques adopted by the two researchers are not significantly different. 


Misuse of Chi-square Test 


Probably one of the most frequently used statistic is the chi-square, 
Unfortunately it is also one of the most frequently misused. It is easy to 
learn to compute, but its correct application is not so easily learnt. 


The most common mistake in the application of the chi-square 
statistic and yet the most critical for its correct application is the violation 
of independence between measures or events. This assumption of inde- 
pendence is not to be confused with the chi-square as a test of 
independence. The assumption of independence refers to the individual 
Observations or frequencies and means that the occurrence of one event 
has no effect upon the occurrence of any other event. Another way 
of stating this meaning of independence is that the probability of each 
event’s occurrence is independent of the probability of occurrence of all 
other events. In statistical terms we say that the joint probability of two 
random events is equal to the product of the probabilities of these events. 

The chi-square as a test of independence refers to the statistical test 
of the possibility of a relationship between two variables. This is often 
called a test of association, the question tested is whether the frequencies 
observed of one category are contingent upon another category. For 
example, is the number of “yes” answers to some question contingent 
upon the age of the respondent ? 

Some other sources of errors in the application of y test as revealed 
by а survey of all papers published in the Journal of Experimental 
Psychology during the years 1944-46 are : 

(i) Small theoretical frequencies. 

(i) Neglect of frequencies of non-occurrence. 
(ii) Failure to equalize the sum of observed frequencies and the 
sum of the theoretical frequencies. 
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(i) Indeterminate theoretical frequencies. 

(у) Incorrect or questionable categorizing. 

(vi) Use of non-frequency data. 
(vii) Incorrect determination of the number of degrees of freedom. 
(viii) Incorrect computations. 

The number of applications of chi-square test does not seem to be 
increasing but the number of misuses of у? has become surprisingly large. 
A lesson can be drawn from the findings of the Lewis and Burke* article. 
You cannot simply use a statistic because you know how to calculate it ; 
you must understand the rationale behind its development and the 
limitations on its application imposed by the assumptions underlying it. 
Limitations on the Use of y? Test 

X^ test is very widely used in practice. However, in order to avoid 
the misapplication of the test its following limitations should be kept in 
mind : 

» 1. Frequencies of non-occurrence should not be omitted for 

binomial or multinomial events. For example, if 5 drugs were tried out 
"оп 5 separate groups of 200 patients each, the number of cures per drug 
might be shown in one-way table as follows : 


DATA FOR FIVE DRUGS 


Drug 
: 1 2 d КОДАК Total 
Number Cured 30 — 120 40 60 20 320 


However, the y? test should not be applied to these data until the alter- 
native outcome (i.e., “not cured") is represented in the table. 


y 2. The formula presented for y? statistics is in terms of frequencies. 
Hence an attempt should not be made to compute on the basis of 
-proportions or other derived measures. 


? 3. The formula presented in this chapter is not appropriate for cases 
in which repeated measurements on the same or matched groups are 
represented in one table. When data from questionnaires and similar 


devices are analysed, the reader Should be careful that he does not set up 


thetables incorrectly. For example, it may seem reasonable to set up 
Ха table as follows : 


Agree Neutral Disagree Total 
Item X : 140 190 170 500 
Item Y : 180 150 170 500 


"However, a x^ contingency test should not be performed on the basis of 


this table, since it is not really a contingency table because each student 
As classified twice in the table. 


SUGGESTED READINGS 
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Mills : Statistical Methods. 


Yule & Kendall : Introduction to the Theory of Statistics, 


* For details please refer to Readings in Statistics by Joseph Stegar. 


5 Analysis of Variance 
—U[7 0 Tiny anc. tere ET OT: 


One of the most powerful tools of statistical analysis is what is 
known as “analysis of variance". Staistical method may be regarded as a 
body of techniques for the study of variation in nature. A systematic 
procedure for the analysis of variation (or variance) developed by 
R.A. Fisher is capable of fruitful application to a diversity of practical 
problems. Basically, it consists of classifying and cross-classifying statisti- 
cal results and testing whether the means of a specified classification differ 
significantly. In this way it is determined whether the given classification 
is important in affecting the results. For example, the output of a given 
process might be cross-classified by machines and operators (each operator 
having worked on each machine). From this cross-classification it could 
be determined whether the mean qualities of the outputs of the various 
machines differed significantly. Also it could independently be determined 
whether the mean qualities of the outputs of the various machines differed 
significantly. Such a study would determine, for example, whether 
uniformity in quality of outputs could be increased by standardizing the 
procedures of the operators (say, through special training) and similarly 
whether it could be increased by standardizing the machines (say, through 
resetting), Analysis of variance thus enables us to analyse the total 
variation of our data into components which may be attributed to various 
"sources" or “causes” of variation. 

Role of the Concept of Analysis of Variance HN 

In the chapter on “Sampling and Tests of Significance", the t-test of 
the difference of means was discussed. However, this test is an adequate 
procedure for testing the null hypothesis when we have means of only two 
Samples to consider. In a situation where we have three or more samples 
to consider at a time an alternative procedure is needed for testing the 
hypothesis that all samples could likely be drawn from the same popu- 
lation. For example, five fertilizers are applied to four plots each of” 
Wheat and we are given the yield of wheat on each of these plots. We 
may be interested in finding out whether the effects of these fertilizers on 
the yields are significantly different or, in other words, whether the 
Samples have come from the same universe. The answer to this problem 
is provided by the technique of analysis of variance. 

The analysis of variance originated in agrarian research and its 
language is thus loaded with such agricultural terms as blocks 
(referring to land) and treatments (referring to populations ог 
samples) which are. differentiated in terms of varieties of seed, ferti- 
lizers or cultivation methods. Today procedure of this analysis finds appli- 
cation in nearly every type of experimental design, in natural sciences as 
Well as social sciences. In fact it has come to acquire a place of great pro- 
minence in statistical analysis. This is because of the fact that the analysis 
of variance is amazingly versatile: it can be readily adopted to furnish, 
With broad limits, a proper evaluation of data obtained from a large body 
of experiments which involve several continuous random variables. Tt 
can give us answers as to whether different samples data classified in terms 
of a single variable are meaningful. It can also provide us with meaning- 
ful comparisons of sample data which are classified according to two or 
more variables. 
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The reader should keep in mind that the analysis of variance test 
discussed here is not intended to serve the ultimate purpose of testing for 
the significance of the difference between two sample variances : rather its 
purpose is to test for the significance of the differences among sample 
means. They do this via the mechanism of the F-test for testing 
for the significance of the difference between two variances, but the test is 
so designed that the variances being compared are different only if the 
means under consideration are not homogeneous. In this way, significant 


values of F indicate that the means are significantly different from one 
another, 


Assumptions in Analysis of Variance 


The distribution of F-values is known to take on certain characteris- 
‘tics with different combinations of degrees of freedom. However, these 
characteristics are present only under certain conditions which we assume 
to be present. These conditions аге: 


! (1) that all populations from which samples have been drawn are 
mormally distributed ; 


(2) that the variances for the Population from which samples have 
been drawn are equal ; and 


-(3) that the individuals being observed have been randomly selected 
from the populations represented by the samples. 


The values in the table given at the end are precise only when we 
‘have met the above assumptions. However, in actual practice it has been 
"Observed that one or more of these assumptions can be “bent” without 
appreciable loss in the adequacy of the F-test. The researcher strives to 
meet the assumptions of the F-test, but he usually finds that if the data 


are reasonably close to meeting the assumptions, his conclusions based on 
the F-test are not markedly affected. 


Conspicuously greater the variance around the sample means, the 
‘samples must be, relatively Speaking, widely dispersed around the grand 
mean, very likely not representing random samples from the same 
population. However, if the sample means are very narrowly dispersed 
around the grand mean, compared with dispersions around their sample 
means, the samples are likely to be random samples from a common 
population. 

Technique of Analysis of Variance 


. . For the sake of clarity the technique of analysis of variance has been 
discussed separately for (а) one-way classification, and (b) two-way 
classification. 


‘One-way Classification 


In one-way classification the data are classified according to only 

one criterion. The null hypothesis is 
Hy: Ш ра Ha... esee = уь 

H, : All the p; are not equal 

that is the arithmetic means of population; 


were randomly drawn are equal to one anoth 
the analysis are : 


s from which the К samples 
er. The steps in carrying out 
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1. Calculate variance between the samples, The variance (sum of 
squares) between samples reflects the contribution of both different 
treatments and chance to inter-sample variability. Individual observations 
inthe same treatment sample, however, can differ from each other only 
because of chance variation, since each individual within the group 
receives exactly the same treatment. The sum of squares between 
samples is denoted by SSC. For calculating variance between the samples 
we take the total of the square of the deviations of the means of various 
samples from the grand average and divide this total by the degrees of 
freedom. Thus the steps in calculating variance between samples will be : 

(а) Calculate the mean of each sample, i.e., Xi, Xo, etc. ; 


(b) Calculate the grand average X. Its value is obtained as 


follows : ГА 
goo db... 
М+М... 

(с) Take the difference between the means of the various samples and 
the grand average ; 

(d) Square these deviations and obtain the total which will give sum 
of squares between the samples ; and 

(е) Divide the total obtained in step (d) by the degrees of freedom. 
The degrees of freedom will be one less than the number of samples, i.e., 
if there are 4 samples then the degrees of freedom will be 4—1—2 
or v=k—1, where k=number of samples. 

2. Calculate variance within the samples. The variance (or sum ot 
Squares) within samples measures those inter-sample differences due to 
chance only. It is denoted by TSE. For calculating the variance within 
the samples we take the total of the sum of squares of the deviation of 
Various items from the mean values of the respective samples and. divide 
this total by the degrees of freedom. Thus, the steps in calculating vari- 
ance within the samples will be : 


(a) Calculate the mean value of each sample, i.e., Хе; 


(b) Take the deviations of the various items іп a sample from the 
mean values of the respective samples ; . 

(с) Square these deviations and obtain the total which gives the sum 
ОЁ squares within the samples ; and 

(d) Divide this total obtained in step (c) by the degrees of freedom. 
The degree of freedom is obtained by deduction from the total number of 
items the number of samples, i.e., y —N—K, where К refers to the number 
of samples and N refers to the total number of all the observations. 


3. Calculate the ratio F as follows : 


Variance between the samples 
Variance within the samples 


Symbolically : 
F= 28 
= -57 
- Fis always computed with the variance between the sample means 
as the numerator and the variance within the sample means as the 
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denominator. This denominator is computed by combining the variance 
within the K samples into a single measure. 

The variance between the sample means is always placed in the 
numerator so that, if the hypothesis is true, F will tend to be equal to 1, 
within the limits of chance variation, and the critical limits of F for such 
chance variation are given in Appendix III Table XI; and, if the hypo- 
thesis is not true, the computed value of F will tend to be greater than 1. 

If Fis found to be significantly greater than 1, as determined by 
comparison with the values given in the aforesaid table and, therefore, if 
the differences between the sample means are found to be significantly 
large than the differences of the individual observations within the indi- 
vidual samples, then the sample means are significantly different from one 
another, and the hypothesis is rejected. By judicious and imaginative 
application of this test, remarkably useful results can be obtained. 


4. Compare the calculated value of F with the table value of F for 
the degrees of freedom ata certain critical level (generally we take 5% 
level of significance). If the calculated value of F is greater than the table 
value, it is concluded that the difference in sample means is significant, 
i.e., it could not have arisen due to fluctuations of simple sampling or, in 
other words, the samples do not come from the same population. On the 
other hand, if the calculated value of F is less than the table value the 
difference is said to be not significant and due to fluctuations of simple 
sampling. 

It is customary to summarize calculations for sums of squares, to- 
gether with their numbers of degrees of freedom and mean squares in а 
table called the analysis of variance table. 


Analysis of Variance Table : One-way Classification Model 


MS 


Source of variation y 
(Sum of (Degree of Mean squares 
squares) freedom) 
— Š 5 - 
Between sample (columns) SSC K—1 MSC-SSC|(K—1) 
Within samples SSE N-K MSE= 50) (N—K) 
Total SST N-1 


SSC=Sum of squares between samples 

SSE=Sum of squares within samples 

SST- Total sum of squares 
MSC=Mean square between samples 
MSE —Mean square within samples. 

Note. The same procedure for analysis of variance is applicable for 

both the equal and unequal sample sizes. 

Rationale of the test. The variation within the samples, ie., the 
variation of the individual observations within the samples from their own 
individual sample means, measures the influence of the chance forces 
which cause the individual observations to vary from one another. How- 
ever, the variation of the K sample means from the grand sample means 
of all the samples taken together, i.e., the variation between the sample 
means reflects not only the effect of these same chance forces, but also the 
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effect of the forces, if any, which cause the various sample means to differ 
from one another. Thus, if there is any such force, i.e., if the hypothesis 
is not true, the variation between the sample means will tend to be larger 
than the variation within the samples. This is precisely what the test is 
designed to identify. 

The following example will illustrate the procedure : 


Illustration 1. To assess the significance of possible variation in performance 
in a certain test between the grammar schools of a city, a common test was given toa 
numter of students taken at randcm from the senior fifth class of each of the four 
schools concerned. The results are given below. Make an analysis of variance of 


data. 
Schools 
A B с ! р 
8 12 18 13 
10 11 12 9 
12 9 16 12 
8 14 6 16 
7 4 8 15 
Solution : s m Ж 
Sample 1 Sample П Sample III Sample IV 
1 Xa Xs Xe 
8 12 18 13 
10 11 T 12 9 
12 9 16 12 
8 14 6 16 
7 4 8 15 
Total 45 50 60 65 
dc 10 12 13 
Grand Mean or gee at 


where Xi, X», etc., represent the mean of each sample and N the number of samples. 
9--10--12--13 44 
X a RES -11 
604-65 
or grand mean of all saiüples- 33530160369 =11. 


Variance between samples 
To obtain variation between samples, calculate the square of the deviation of the 
various samples from the grand average. The mean of sample I is 9 but the grand mean 
is 11. So we will take the difference between 9 and 11 and square it. Similarly for 
sample II the mean is 10 but ihe grand average is 11 and so will take the difference 
between 10 and 11 and square it. Thus we will have the following table : 


Sample I Sample II Sample III Sample IV 
-5 Q-x* (Ёз—Х)* Qn-X» 
4 1 1 4 
4 1 1 4 
= 4 1 1 4 
4 1 poer. 4 
4 1 1 4 
20 E 5 20 


Sum of the squares between the samples 
—20--54-54-20—50 


SM-A—1177-58 
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OMS эпо MOI 


Эш! оп гі 
Penance wi ay о! ns 
Home Ш his ina p 
from the mean values of respective. samples. lé; the пе 


ud will take de 
f ing t H 
оозе fal 


5 12! піз |ўдөтө[@Ї 
mon fp е Ий "лижа ni тн? 5 mdu lo rs dum 
m sot ТҮ уйн! at. дем (agai t a 2 sinis mà» ^ 

12 . agate 18 36 13 0 

H 3 1 12: 0 ^9 16 

os 1 16: 16 è 12 1 

14 : 16 6: 36 ):16 9 

4 HM 36 8 16 ©115 4 
T - S INE с: 

(Хә X,)* {Х,—Х„* r X(X,— X 
=58 =104 үк. 

Total sum of squares within the samples ELI 
ү F s еВ 30 оой Еа " 9 айк? 
АБРУЕ Mea ithi г же ЖЁН, c zt = 
2087 208 I a 


2 20-4- 16 713: 
It is advisable to chzzk up thé calculations by fiadiag out total variation. Total 


ati 
variation is calculated by.taking the squares of the deviation of each of tie items from 
the grand average. 
[Pain 


ар 
Sample э Sample ie Sample 3è ^ Samples її 
DECUS E mom Ere 
— X, A Hae y, * (ko yy 
9 2718 «c mM hasn 13 4 
1 12 1 9 } 4 
т ч#йпипт ut 9.5 б=т эз 25e „.12э sh uh bode 
9 -36 25 16 25 
7 16 (58 9 15 16 
z-e- 
=36 


“lating 240109 
2 3 


-squares —— freedom — SISSA 
Between samples 50 3 67 
Within samples 208 16 E 
“Total  — 258 Agnes. = 
T— rn р Variance between samples- 167 —1985 x 
ip ә - — Variance within samples ` 13 —! oc 


From Appendix 3 Table XI, the "ЕНДЙ "vatis "ot fog 703408 HAG at 5% 
level of significance—3 24. The calculated value of Eisless than the critical value and 
bence the difference in the mean values ofthe sample is not significant, Le., the samples 
could have come from the same universe, fef e 


А 
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Short-cut Method. Тһе aboven enethadrs) 9f calculating the sum of 
squares for variance between samples an variance within samples is not 
generally follawed in practices because it is gime-consuming, An easier 
method known as short-cut jüethod is usually followed which reduces 
considerably-the computational work. The coo By the short- 
cut method hall be as follows t 0. 


$атр Sapte EU Sample 
Lier I МИ qv 
x kre DON, C EO GRE ra Cg Xt xà 
8 21640 12 044% 180—:*324 13 ¢-="469 
10 —--100 HESS МЕ 12.044 аб 
12 4144 2 < 81 160 256 12 1— 1144 
8 64- 14 196. 6 36. 16 256. 
7 49 4 16 8 64 10 dem bns1225 
IX, БА CS DIL pee Big ag, IX. БХР 
=45 =421  -ho  -—ss *-60 28 =65 =875 
Thesum of alt the items of varioussamples — — - " ЖАР 
3 *(.—, X) =ЗАт+ААХ»+5Х;+5ЖУу. — 81.) S — 0A) 
Corrgction factor i ; : 
è та озо? (220)? _ 484001, 420 n 
› RUN Рет) E^ b 
——— Ihe Total sum of squares - I I н: 1 E. i 5 
o= (X – qua L9 EI UO RGB o= -F 
SRR GIS a zo1sup2 10 mua 
—2678 —24202:258 (as;befgre) 


Sum of squares between the vien is obtained ras follows: ы чай 


[om x. a. (Ba)? ЫЫ oe 


=т= = 


и = + ушу үа ie DIN roni mu 
A YR ) 42231. „муў. = rÅ) j ns 
0 E+ at a+ è + I £- 
д! I РЙ | rH [ б 
і $4 5 I- e $ 
e a+ So470— 942050 (ag before) P + I s 
Sum of sqüdres el nt c dÈ cum һ 7 
“Gh — hig Total sum of squares of squares Hela: : 
dopo: чел, 29 el E A Les pem 
Coding of Daa ——— — — —— E 
in: ee alae A пий гэр 1o Tj 
While making out amqmalysiscof.varigace it should be ihat 
the final quantity tested is a o and so.dimensionisss, ,. This 
the original measure nee pan де coded tgcsimplify ВУ is miae with ч 
thefneed for any su БЕ ae results. The following 


data are given below and the calculations are done tli 


—$0$—— — — ; 
bs. :'"Coding ius to the addition, mijláplication? subtraction, 
У asonstant. ос сг ez Set 


Neh ee о Е аы [BEL Жн | 
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CODED DATA 

4 В с р 

х Xs Xs X4 

Gp POPE SR Ee +3 

0 T1 T2 =i 

42 d +6 +2 

=2, m —4 +6 

Ез —6 -2 +5 
СРР 2X,—0 zX,-10 ZX,-15 

3o 0 2 3 Н 


— = Р А ies tas ie 
Grand mean or 


em Хаана _—1+0+2+3 _ 


4 4 

Sum of squares between samples 
Q-X* Q-X* (Х.Х) 

4 1 1 

4 1 1 

4 1 1 

4 1 1 

4 1 1 


XXi-X)i-30 200-05 Х,-Хуў?=5 
Sum of squares between samples 
=20+5+5+20=50 
Mean squares between samples 
= 20. —167 [as before] 


Sum of squares within samples 


x Qi-X)' X. (NR) Xs QG-X* Xa Qn-X2!! 


-2 1 T2 4 +8 36 +3 0 
0 1 +1 1 +2 0 -1 16 
+2 9 -1 1 T6 16 +2 1 
-2 1 +4 16 —4 36 +6 9 
-3 4 —6 36 -2 16 +5 uem 
5(Х\—Х,)* 5(Х%5— X3)* 2(X3—Xs3)* х," 
—16 =58 =104 =30 


Total sum of squares within the sample 
—16--58--1044-30— 208 
Mean squares within the samples 


208 208 
720-4 = 1g 7? [as before] 


The benefit of coding can be appreciated better if we have big fig- 
ures. Thus if the figures are : 


Sample I 740 742 848 660 762 
Sample IT 745 650 758 664 754 
Sample III 788 59 652 720 738 
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We can subtract 700 from each of the values and than carry out 

analysis of variance. This would very much simplify calculations. К 
Illustration 2. Тһе following figures relate to production in kg. of three varieties 

A, B and C of wheat sown in 12 plots : ý 


A 14, 16, 18, 

B 14, 13, 15, 22 cdi ei 

С 18, 16, 16, 19, 20 

Is there any significant difference in the production of three varieties ? ЕЈ y 
(B.A., Bombay, 1971) 


Solution, Let us take the hypothesis that there is no significant difference in the 
pco 1a:tioa of the three varieties. Carrying out the analysis of variance by taking 15 as 


common. 

А B c 

Xi X» Xs 

"n 

-1 -i +3 

+1 —2 ap Oey 

ҮЗ 0 +4 

41 +4 

+5 

Total EM=+3 хХа=+4 5Хз=17 


Mean  "*Xi-1 Xl X3—34 К 


Grand Mean ог X= а 


Веи EL 


Zz 3 
Sum of squares between samples £ 
8-8 (8—5 (x,-Xy 
064 0°64 2:56 
064 064 2:56 
0°64 0°64 2:56 
064 2:56 
256 
х0. )19 z(Xs-X)—256 zX,-X)-1z8 
Sum of squares between samples 
—1:924-2564-12:80—1728 
SUM OF SQUARES WITHIN SAMPLES 
aene 
A Qü-X)* X (5—X2* Xs (X3—Xs)* 
-1 4 —1 4 +3 016 
+1 0 —2 9 +1 576 
+3 4 0 1 T4 0:36 
+7 36 +4 0:36 
T5 2:56 
z0—X)* Z(X»—X3)* z(X,—X3)* 
=8 =50 =920 


Sum of squares within samples 
—84-504-92—672 


yen SAKATRIS ӨР VXRIANOE 


juo vro пелі bns ; dg ОЁ VARANEN pesitdue nso SW 
i -enoiielu[so AY oum visv blow eidT — .sonsingv Jo ziavisne - 
25iieivavSpurpetef variation: ubox 01 989v гезне апды edT £ Miu 


Between Samples 1728 код $T ai пог tegigato O bas 8 .K 
Within |. ,, Sl 672 : DI 147 
os Total ©! д1 8448 dI п ‚8! ж 2 


ФӘПӘПЕВҮ 551 1o noitoubexr sili ni sonsisflib пвоЙіп іг Xni элеп! el 
(1791 dined KB) Fæ EE UI 
E Vr оа гі edil Tac: eicortoqvd sri) ss! ev to) поі шог А ^ 
98 СІ SOIR GE а-г АЛГ Xakülatedi!value- of!) Fis ‘less. 1һай‹їһе 
table value, Hence there is no evidence to doubt the hypothesis, We, therefore; rcon- 
three varieties do not differ significantly in-preduction. — 


Analysis of Variance, in Two-fold Clüssification > 


ЖЕЛ ҮТҮ two-way classification the data are classified according to two 
different criteria or factors. The procedure for lanalysis of variance is 
somewhat different;than the one followed while d iling with problems of 
one-way classification. In two-way-classification the analysis of variance 
table takes the follówing form : 


: Se 
Source of Varios Sum of. Sqitares 92 ау E+ = Hlan SYST a 


SSR L 


SSE (с—1) (r—ijt TEE AR Жл HOD (c—1) 


SSTB I=" аркы 


ьо C Кас» U 
ES Em of squares between сонная n35w15d e»18upe 1o mue 4 
ISSR (R-S) rows a 7 
ннн Ж) GSH 
CNA ас Гуа ГҮ: -the-residua— Des = 
SESST =Total sum of фі е 90 


30 

| Fhe sum of Squares fo € source 'ResiduaE?$s Obtained by sub- 
tra from the total sum quares the sum of squares between columns 
-and rows. 


VI The на numbe?:zaf-deiregs of freedom =er =k, F 7 )z 
Where c refers to columns, and zslqmse nsewied e»1&Upe 10 mu2 
Tos, BOTTON CESE 


Number of dëst AEE bth "EE E Ghia 


"(eX — eX) gk n —e D ok Gk) dn rk 
i Number of degrees of freedom between rows — — — ' 
AT 0 x =(r— n I— 4 1 
96: I+ £— 0 1+ 
NumB&of degrees of freedom for residual è t+ 
aes è+ =(c—1)(r—1) 


The tetal.sum of squares, «sum sof squares «fo —hetween columns’ 
and sum of/sSquares for ‘between rows’ are obtained ЖЕШ Same way as 
"before. TORQUE Ls КЫЛ, ратор ime 
E z nee nidiiw zo1gupe to mu2 
Residual (Error or Remainder)e«Totalsum of Squares—square 
between columns—squares for between TOWS. 
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3o агт type of problem invoking two-way "classification; ^ Residual 
_is the measuring rod for testing significance. .It represents the m 
sofivariation due to forces called ^ehance'rs T he following'exemples would. 
~ illustrate the procedure : n mm —7] 


9 иѕігабоп 3. ТА tea company appoints four salesme: А B,C and yag ob- 


. Serves "their sales in three Seasons—summer, Winter -and-monsoon. The figures chs) 
are given in the following table: pit 
TU Salesmen E rmt D i Season atals 1527 
PU отино SUN RRA ч ык. 21 эп: s 35° +з 
Winter 28 31 32 120 
гЬ Monsoon, « 26, 29:41120 229 e cr dizit w 
зт 
Salesmen Totals-. - 90 96 360 


Carryout an Жул of Variance: 
Ui as Solutipn;. Thé above datas are classified dccotding to critéria»(/) salesmen? and 


H әр .In Ad to pu we codoshe здаќа:ЬУ | шысы 30-from 


[ки Беарн 


Summer 


Winer | = 


oi bres - 
918 гфытеёбоп Factor — QN "i Ьо of Tris or N ees 


Sun БА фей, between salesmen, bns Demos ait anis eure qs 


his will be obtained by enum up the salesmen totals teliingabacks total 
d bu aliranins ү 


ures, 


by the number of items. include it, vU КАП, 
обоените от ей: ТАШЫ of Sq 


us sufn ^ 


op o toe аА C» 
E Sis! ovi 

Jd 5 0431-27-12 50—42 

Sum of E between seasons i n 
is will be obtained | by dividing the M of dE season totals by the number 

ОЁ items that make up each total, adding all such figures and subtracting theffrom the 
Correction factor, thus sum of squares between seasons A 

SE ę КОКО or ы ы = 


Зо Disiy чөп o 
MO 
{йа ijr Solares 
pir Thisiwalk bexobtaimed: by addi 
acting the correction factor 


£80) be-Total sum/of sue c(t be тей 


fessis. їп же table's 


ме ш: 


np the 


=210-0n210" + i 
v-(12—1)— ESTE 
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The above information will be presented їп the following table of Analysis of 
Variance : 


Sources of variance Sum of squares d. f. Mean square 
Between columns (salesmen) 42 3 14 
+». rows (seasons) 32 2 16 
Residual 136 es | 2767 
у 210 11 


| 
Let us take the hypothesis that there is no difference between the 
sales of salesmen and of seasons or, in other words, the three independent 
estimates of variance are the estimites o° variance of à common 
population. 
Now first compare the salesmea variance estimate with the residua ^ 
variance estimate ; thus E^ S267 | 
eater variance d 4 | 
Smaller variance ]4 162 
The table value of F for 3 and 6 degrees of freedom at 5% level of signi- 
ficance is 476. The calculated value is less than. this and we conclude 
that the sales of different salesmen do not differ significantly. 


Now let us compare the season variance estimate with the residual | 
variance estimate ; thus : 


Greater variance _ 22°67 —142 | 
. Smaller variance T6 | 
The critical value of F for 2 and 6 degrees of freedom at 5% level of 
significance is 5°14, The calculated value is less than this and hence 
there is no significant difference in the Seasons as far as the sales are | 
concerned. 

Thus the test shows that the salesmen 
far as the sales are concerned. 


Illustration 4. Three varieties A,B,C 
design with foar replications. The yields are 


and the seasons are alike so 


of a crop are tested in a randomized block 
given below in pounds : 


Variety " EDU ECEN] Total 
ag on 
A 6 4 8 24 
B 7 6 6 28 
c 8 5 10 9 32 


жт. eSt whether there are differences between varieties. Т, ield of 
A differs significantly from that of B. ERES Сто тане yi 


i (B. Com., Bombay, 1972) 
Solution. Here three varieties А, B, Со 


olutk fa crop are tested i domized 
block design in 4 blocks or replication (Bi, Bs, Bs and BD. Then ehh teplication 
it is expected that there may not be any. difference of soil, but from one block to another 
the difference in production can be due to soil differences as well. 

Let variety А be denoted by Vi, variety В by V, variety C by V; and replications 
(blocks) 1, 2, 3, 4 by Bi, Bo, Bs, В respectively. ^ > ^ C by Из and rep 


The following calculations will be made : 


я 12 ayn 
Correction factor (C.F.) — N^ 12—588 
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Sum of squares between varieties 
"A (EV+ 2V + 3a) _CF. 
ORAE eT AE 


ВОВ) 


2384 
= 2—58 
—596—588—8 


Sum of squares between blocks 
(ZB)*--(EB2* -(2B3)-(2B)* c p 
MORARI de E 


* 


(anes (iste onn (24)? seg 


-— —588—18 
Total sum of squares =[(6)+ (7)*+ (8)2-+ (43+ 16)" + (5)"+ (8)! (9* 
-- (10)2-- (62) 3- (9)*2- (9)*]— C.F. 
=[36-+49+644-16-+36-+25+ 64+36+ 100+ 36 = 


+81+81]—588 
=624—588=36. 
Residual sum of squares=36—(18+8) 
=36—26=10 
au ANALYSIS OF VARIANCE TABLE 
Source of Sum of Degrees of Mean square F 
sum of squares squares freedom 
v 
on yee 
Between Varieties 8 3-1=2 4 2475.24 
1:667 
Between Blocks 18 4-1-3 в qe 
Residual 10 12—(3+2)=6 1:667 


cases is less than the critical value and hence the variances between varieties and be- 
tween blocks do not differ significantly from the variance due to random errors. Thus 


from that of В, 


Illustration 5, The following data represent the number of units of production 
per day turned out by 5 different workers using 4 different types of machines :- 


Machine type 
A B Cc D 
Workers 1 44 38 47 36 
2 46 40 52 43 
3 34 36 44 32 
4 43 38 46 33 
5 38 42 49 39 


Mes. (a) Test whether the mean productivity is the same for the different machine 


с *EV2+2EV22+2V5* are based upon totals of four items and that is why we have 
divided it by 4. 
** (2 B,)*-L (E25)! (2B3)*-- (284)? are based on totals of 3 items and that is w 
жеке dni s 


54-514 ZANALYSIS OF MARIAGE | 
b) Test whether the 5 men differ with respect.to Mean-productivity, хт 

® э-ү (M. Com, ОЛАУ; 1969) 

Solution. Let us take the hypothests tha hat (a) the mean Productivity is the same 

for four different m. the 


machines, and (5) Sue da mot differ with respect io mean pro- 
Шы: То simplify calcu lations let-us-divide each value by 40. Тһе coded data is 
given below : M 


Workers 


upa to mue (subles 4 
ке кы с. gno dep —Correction Factor | 
я snup лаъ | imi; А 
ха 2 8) ane та taupe Yo И, 
ла заса ays 
збит. ofisquares between workers " 
Tal RCM Gn foie Lo» 02 qn (gP SREY aromis 
vie è " “Ё Hob uf * nda mond 


TRU re т 4 t-31—20 алея 


іпдјг to | R ;-10:25-r49- " ¢ 
CEO MC MU ake eae 
шат if eru ) пай cm [ре 
| 3 PT it ob 2 

s B ROE фер = tj Vr CREE Denise 18d] mon 
поі? borg lo zinu!o sedem s; Akt 2: MCN URS eit 


bns ERU? el 
э] iw Aelleit&y поо бүбүсү 2 
n ü 2 ЗЬ 
g эла элэ} 
Meese эсе HS padres (сай жые руш y du 


d 5 716436136 MARTE U 
M ТЬ 2811165964491, еу 
fe iè =594—20-574 за £ 
Residyal or Remain Total sum'óf Squares—Siim of шше Between machines— 
x ES Sum of squares between workers 
—574—3383- BT | 
Depress Mi ёа йаг ^^ ^ 


sac Lay Л 
=19—3—4=12 ES 
2 p= = Y bsd пз! MI no? 19217449 

Чул ы T exe * v4 1i bsblvib 


= 
cs 


ANALYSIS OP VKRINNOE А15 
ANALYSIS OF VARIANCE ТАВІЖ!9152 nsswisH sonsits V 


oo NI m [SM ie Ча mA) 
e $ [P AUN orF 
e t ii 0 m 
“Between Machine types |^ 3388 | 39 | 112933 p -1839 
=F 2/9 jg 0 
Gea = «i — $5 0=4( 0:375 E ARE өф 
3 ug She VE-FEL--0-Le--eslqmne goswisd e».pupo 10 nue istoT 
Remaindes Oe area Ss pn яд гурди? ^ | 
* nos ^o imi? ^9) 1591290 | Awsome’ 
19 ез, | Noltninny 


эрй? | 13594. 


te)t For 3 and.12 degrees of freedom Fo. RET 49 Е за 


Since the.calculated value E 4) is. greater than the table value we conclude that. 
the mean productivity is pa samefor the-foxinslifferent t ypes of machines. » 
(6) For 4 and 12 égtees Of freedom:ifo s AG 


т Е ала ди Ethe! tabla Valueloy Менде: ri теп 
Sm 4 s 10 р 


АЕ pisv oldst ЭП} med? esl zi А 10 Sulev 

ustralion 6." rie 12 fale id diem a ad э ilti oR ui 
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пола э ПА ob beximobngt Cent BN пз бли дәвә 


eniste od: тәйөйи 1291 ој sodsi1sv lo гігуВпв эп! most 
росла aera stony n v 01 һлваэйй)їу/ in 


T noiemtzullT 


Id avit 
а oly 


0:5 
Attali эв 


1 ў 50 4 A 
o 39 44 40 39 RT 
Isjhere a significant ате th the sales made by the four Sore peniane 
Solution, Let us take the hypothesis that the mean seles of the four sales- 
men do д differ significantly. 
is iN к 
Mean sales al salesman "ES BS cas! к 
èe el iy! os 8I a 
Ba 32. =44 
se NL BS ss 3 es ec э 
E е А Ф E i Я 10 einioq X? 
v TRES TUR, T WE DM TEM jx YELL) сд 
t 3 é m 
гдай “Grand Mean 54246341 44 oir IA А 
Variance withitt samples ase st 
X, (X8) eX. — (X,—X,* Xose (X,—ÀX* X, (Xge- Xy 
50 E все 40 16 4806 4 39 ы 4 
46 48 16 50 16 45 16 
39 36 RSE 44 0 492 36 39 a4 
ere Жыкы e 
E mois EXE) (OIX. a= Xy ZX, AX) 


dimise NEGA ish is 182 гі 2191-3211 гігоізе188ї эп) эжбби 


=42311102 =24 
Sum of squares within samples ei л 
624+324+56+24=174 
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Variance Between Samples 


ED» (X,—X) (X,—Xy (¥,—X)? 
1 0 4 9 
1 0 4 9 
1 0 4 9 


IGn—Xy—3  zU,—-Xy—o — xunn-$y—1i2 X(X,—X)—27 
Total sum of squaies between samples=3+0+12+27=42 
ANALYSIS OF VARIANCE TABLE 


Source af IT Deerees of Sum of Mean F 
variation freedom squares square 
Between — 3 42 "meg m 
Within 8 174 21775 


X. Larger Variance _ 2175 21:55 
Smaller Variance 140 

The table value of F for v —3 and уз=8 at 5% level of significance—4'07. The 
calculated value of F is less than the table value and hence the hypothesis holds true, 
We, үеге, conclude that the mean sales of. the four salesmen do not differ signifi- 
cantly. 


Illustration 7. Below are given the yields of three strains of wheat planted in 
five blocks of three plots each under a completely randomized design. Allthe fifteen 
plots are of equal area. Perform the analysis of variance to test whether the strains 
are significantly different with Tegard to yield. Ignore variation between blocks. 


Strains Blocks 
I п ш IV У 

"REESE: 

A 20 21 23 16 20 

B 18 20 17 15 25 

с 25 28 22 28 32 

Se т сс МЕЗ. Tct. РО NN 
5% points of F 
"n 2 3 

12 388 3°49 

13 3°80 341 

14 3774 334 

15 3°68 329 


(M. Com., Meerut, 1974) 
Solution. Let us take the hypothesis that there is по difference in strains with 


regard to yield. Taking 20 as origin, the given data become : 
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Blocks 

Strains I п ш IV Ж, | Total 

A 0 +1 +3 -4 А. 0 

В "12 0 —3 =5 +5 | -5 

©; e) +8 +2 +8 +12 | 35 
Total 3 9 2 -1 7 «| 30 

| 
Неге Т=30 ОО. оор 
Correction factor= NUS 5799 


Sum of squares between strains 
 (of*--(—5*-(35? T* 
7 5 TN 
=250—60=190 
у=3—1=2 
Total sum of squares 
=[(0)?+(—2)?+-(5)?-+(1)?-+ (0)? 4- (8)?-1- (3)?4- (—3)* 
+ (2) (74)? -- (—5)*-- (8)*-- (0)*-- (5)? үзү 
+а21-ү- 
=[4+25+14+64+9+9+4+16+25+ 64+ 25+144]—60 
=390—60=330 
Sum of squares within strains 
=Total sum of squares—Sum of squares between strains 


—330—190—140 
у=15—3=12 
ANALYSIS ОЕ VARIANCE TABLE 
Source of £ Degrees of Sum of Mean 
variation freedom squares square F 
Between Strains 2 190 95 814 
Within — ,, 12 140 11°67 


For v, — 2 and у2=12 F,.95—3:88 

The calculated value of F is greater than the table value and hence our hypothe- 
‘sis does not hold good. We, therefore, conclude that the strains are significantly diffe- 
rent with regard to yield. 


Illustration 8. The Amrit Merchandising Company wishes to test whether its 
three salesmen А, B and C tend to make sales of the same size or whether they differ in 
their selling ability as measured by the average size of their sales. During the last week 
there have been 14 sale calls—A made 5 calls, В made 4 calls and C made 5 calls. 
Following are the weekly sales record of the three salesmen : 


B с 
Rs. Rs. Rs. 
300 600 700 nd 
400 300 300 
300 300 400 
500 400 600 
0 = 500 


Perform the analysis and draw your conclusions. 
(M. Com., Delhi, 1973) 


кг: Solution, — - Ава WES EET 


Let us take the hypothesis that the f the three salesmen are| of the same 
size orthey do not ч In orger to sic M басш ао: let us ; divide ch valu by 
100. So the coded data are B А Lr 


с} | 
7 
0 40 è- f- 1+ s A 
з: 0 i a 
£u 8+ 24 э 
Mean of 4 = © =3 * À 


g 

T 
"Be x 
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| IstoT 


25 uy ET LET. 
н CHS SS этэн 


i ps те EE 7101281 поето 
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3 eniste qesw sd 


?218upe 10 mua 1 
VARIANCE WITHIN SAMPLES 
A КРА 3 
х Q5 — 1)° X, oer (A oA)? X, (Х,— 3)? 
КИК AN ЛТ Т VT тшт 4 
Y 1 d to mue чт 


1 
= 5 0 
3,15. Зу otro 16, Tcr 2X,-25 men 2" 
08-09-00 zm 
Total sum of squares within samples о mug 
eoleiie nsawisd 29181424 0:«30up2 10 myz lstoT = 
›=3—1=2 0b —0er—org— 


VARIANCE BETWEEN SAMPLES 


?nisue nidi ?318Up2 1 
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Mib ШЕР lf БҮР ons tad? 5j 
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Between Э 2+ 30 K 
Within „я 11 2A 10 09h. 1648 
RO pd 15 _ 009 00 
00€ F=—> =16'49)05 
005 0°91 002 
Бог eo? —2and*»—11 Fyo98$'98. 002 
The calculated value of F is much- greater than the table value, Hence the 
thesis does not hold good. Wezuthevefone,,conehi s that en differ 
heir Ша asi as measured by the average size o UM di senses 
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оз 


te; produce 


310: Thes first - great. вши 
ї@©Ное Оё; experimental 


‘obscuring effect of the variability inherent in the material and the nature 
of the 'obsérvations. Not only.:was randomization "needed in-ordér to 
remove bias, but also for making valid estimates of standard errors. _The 


results of high precision. To Fisher goes most of the credit for s 

and solving these problems and creating a new branch of science from 
which experimentation in miatiy fields óf?research has since benefited. 
Alti S A TT j: : 


ruction. ‚арӣ. .; Ran- 
aed ise ural. research 
jp bich е n ds 
of fand | ects, Such as 
УЧ of dim different makes" of 
fertilizers. В not erly "to 
ЮНЕ Of Soyabedns Bat alsoio d ў land. 
To isólate the *"bláckzeffect" by assigniigtreat. 


ments at random to plots of each block of latid is émployed.” "Thè 
‘Sloekscare formed mstch-a »way:tlrat-each- contains: is Mahy"plotw’s there 
énts to" “atid one plot'from'éacti is ‘randomly “seléctadt 
adhi 26508: BY Уйан 
"field plar for am agricultural experiment, say for four 
C, Р) іп-ѕіх blocks. of four plots._The arrangement ofi "t| 


as shown below : т» 
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A-62 DESIGN OF EXPERIMENTS 
Despite its agricultural origin, the randomized blocks design is 
widely used in many types of experiments. For instance, to determine 
the differences in productivity of different makes of machines (treatments) 
we may isolate the possible effects due to differences in efficiency among 
operators (blocks) by assigning the machines at random to randomly 
selected operators. The basic idea here is to compare all treatment effects 
within a block of experimental material by eliminating the environmental 
effects. 

The analysis of variance table for a randomized block design will, in 
general, have the following form* :— 


Sum of | Degrees of | 


Source of Variation Squares freedom Mean squares 
Column Treatments SSC (c-1 |  MSC-SSC|(c—1) 
Row treatments (Block) SSR (r—1) MSR-SSR|(r—1) 
Remainder (or Error) SSE (r-1)(c-1) _ MSE-SSE|(r—1) (c—1) 
SST RC-1 | ү 


Ву comparing the treatment mean square with the remainder mean 
Square, we can decide by an F-test whether the treatments have any effect, 
regirdless of whether there is a significant variation from block to block. 
Advantages of a Completely Randomised Experimental Design. The follow- 
ing are the main advantages of this type of design : 

1. It allows for complete flexibility. Any number of factor classes 
and replications may be used. 

2. The statistical analysis is relatively simple, even if we do not 
have the same number of replicates for each factor class or if the experi- 
mental errors are not the same from class to class of this factor. 


3. The method of analysis remains simple when data are missing 
or rejected, and the loss of information due to missing data is smaller 
than with any other design. 

Illustration 1. The following data i ion 
per day turned out by different E E ee units af sape 

(a) Test to see whether the 5 men differ with respect to mean productivity. 

(b) Test to see whether the mean Productivity is the same for the 4 different 


machine types. 
Machi 
Workmen 1 20 хна 3 4 5 
1 8 10 7 12 6 
2 12 13 8 9 12 
3 7 8 6 8 8 
4 5 ^ 5 3 5 14 
32 36 24 34 40 


Solution. Let us take the hypothesis that 
(a) The 5 men do not differ with regard to mean productivity. 
(b) The mean productivity is the same for 5 different machine types. 
^ *$SC- Variation between column means ; SSR— Variation between row means 


SSE- Variation within or for errors ; SST— Total variation 
MSC- Mean square between column means ; MSR —Mean square between row means: 
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Machine 
Workmen 1 2 3 4. 4 5 Total 
1 8 10 a 12 6 43 8'6 
2 12 13 8 9 12 54 108 
3 7 8 6 8 8 37 74 
4 5 5 3 5 14 32 64 
Total 32 36 24 34 40 166 
OW m Ona 6 8'5 10 
T CIT е (166)? _ А Е 
Correction Factor Tate 7507 =1377'8 
Total sum of squares 2 
=[(8)?-+ (12)? - (7)*-- (5)*-4- (10)?4- (13)?-- (8)? 
+ (SPE (DA GG 3)?+(12)?-+ (9)? 
+(8)2+(5)*+(6)*4+(12)2+ (8)2+ (14)°]— 
=[64+144+49+25+100+169+64+25+49+-64 
+36+9+144+81+64+25+36+144+64 
+196]—1377° 


=1552—1377°8=174'2 


Sum of squares between columns (Machine type) 
2 


= 41(32)8+ G9*4 (24)°+ (34)*+ (401 — Ty 


=3[1024-+ 1296+576+ 1156+ 1600]—1377°8 
—10652]-13778 - 
—1413—1377:8—352 

Sum of squares between rows (workmen) 
= 3143/4 (54)2+ 87) 82]- 78. 


=1/5[18494-29164-13694+-1024]—1377'8 
=1/5[7158]—1377"8 
—1431'6—13778—53'8 
Residual or Error — Total sum of squares—(sum of squares between columns 
+sum of squares between rows) 
=174'2—(35°2+53'8) 
=174'2—89°0=85'2 
ANALYSIS OF VARIANCE TABLE 


Source of Variation S: Degrees of M.S. F 
freedom 
, б 1793. 
Between workmen 538 4—1=3 17:93 (MSR) 1725 
"80 

» Machine Type 352 5—1=4 8'80 (MSC) ss =1'24 

Error 852. (4—1) 0 710 ( MSE) 
‘Total С 142 19 


(a) For v1—3 and v2=12 Ро.05=3`49 

The calculated value is less than the table value, Hence the mean productivity 
of workmen does not differ significantly. 

(b) For vı=4 and v;—12 Fo.o5—3'26. The calculated value is less than the 
table value, Hence the mean productivity ofthe different machines does not differ 
significantly. 


SM-A—11 77-59 
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Illustration 2. Three varieties A, B, C of a crop are tested in a randomized 
block design with four replications, The plot yields in pounds are as follows : 


А6 С5 A 8 В 9 
C8 44 В 6 сә 
BT B6 C 10 A6 


Analyse the experimental yield and state your conclusion. 
(B. Com., Bombay, 1974) 


Solution. Let us take the hypothesis that variation between varieties and 
between blocks do not differ significantly from the variance due to random errors. 


CALCULATION OF SQUARES 
Blocks or Replications 


Variety 1 2 3 4 Total 
A 6 4 8 6 24 (V1) 
B 7 6 6 9 28 (V2) 

mo à - 8 5 10 9 32 (Из) 

Total 21(81) 2C 1502) 24(Bs) 24(B4) 84(T) 

А еа ауе 
Correction factor=- 7 = ^p 7588 


Sum of squares between blocks 


21 pla pts pe 
— Bi? + By LA TBÓ CF. 


= WUHAN _ sg 


= A41+225+576+576 _ 


3 
= 1518 _558—606—588—18 


Sum of squares between varieties 
RES Vi VV VOS 
4 


588 


cC 


x Qt Q9 Goj* yes 


TEETAKO s. 


"EM s88 — 596—588—8 
Total sum of squares 
=1(6)*-+(7)?-+(8)8+ (4)*-1- (6)-1- (5)2-4 (8)24- (6)* 
-F(10)*-- (6)*-- (9)*-- (9)2]— CF. 
ЖОЛЫН КЫЫ co cc 
--81--81]—5 
=624—588=36 Uso meg 
Residual sum of squares—36— (18--8) —10 


ANALYSIS OF VARIANCE TABLE 


Degrees of freedom итә и Me Е 
Source 44. Son ener 
Between Varieties — 3—1—2 ug js ЕДИ ОТ УЛУУ, 
3H Blocks 4—1=3 18 6 26 
Residual 11—(3+2)=6 10 1:667 


The table value of F for v1—2 and уз=баї 5% level of significance is 5°14 and 
for уу=3 and vy—6 is 4°76. In both cases the calculated value of F is less than the 
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table value and hence the variances between varieties and between blocks do not differ 
significantly from the variance due to random errors. 


LATIN SQUARES 


Latin squares are very extensively used in agricultural trials in order 
to eliminate fertility trends in two directions simultaneously. The data 
are classified according to the different criteria, i.e., according to columns, 
rows and varieties and are arranged in a square known as Latin Square. 
The term Latin square takes its name from a figure of mathematical 
puzzle that was studied many years before its use as a plan of experiment. 
In this design there have to be as many replications as there are treat- 
ments. The experimental area is divided into plots arranged in a square 
in such a manner that there are many plots in each row as there are in 
each column, this number being also equal to the number of treatments, 
The plots are then assigned to the various treatments such that every 
treatment occurs only once in each row and once in each column. This 
can be done in a large number of ways and the way it is to be done in any 
particular layout must be determined randomly. Suppose the data are 
classified according to rows, columns and varieties, varieties being repre- 
sented by the letters A, B, C... etc. Then a Latin Square is an arrange- 
ment of the letters (/.e., varieties) in а square in such a way that each 
letter (variety) occurs once and only once in each row and each column. 
A Latin square, of m order, is an arrangement of the symbols or letters 
in squares such that each symbol occurs once and only once in each 
row and each column. There will be т rows, m columns and т varie- 
ties. Every symbol will appear m times in a Latin square. By various 
permutations and combinations, letters or symbols can be arranged in 
several different ways and thus several different Latin Squares can be 
constructed but each symbol or variety appears once and only once in 
each column and each row. Varieties, rows and columns are all equal. 
For example, in a 5 by 5 (5x 5) Latin Square, where data may be classifi- 
ed according to rows, columns and varieties, varieties being represented 
by letters A, B, C, D, E, the arrangement may be as below :— 


1 2 3 4 5 1 2 3 4 5 


| 
ТАД Ву o2 MEE RATIO. T6009 Ty 


| 
2|B B [A 


[o] 
[>] 
= 
[9] 
m 
v 
> 
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Thus the total number of possibilities in which arrangement can be made 
is very large. The totals are given in the following table : 


Size of squares No, of different Squares 
2x2 2 
3x3 12 
4х4 576 
5х5 161,280 
6х6 812,851,200 
7х7 61, 479,419,904,000 

—— ÁÓ—— 


No simple formula exists, and the totals for larger Squares are not 
known. Fairly rapid procedures for the selection of а random square 
үр b 7X7 have been devised (Fisher & Yates, 1953, Kitagawa & Mitome, 
953). 

A Latin square for use should ideally be selected at random from 
all possible squares of the same size, but as explained above there are 
practical difficulties because for the larger squares the total number of 
possibilites are very large. 


Significance of Latin Square 


Latin squares are most 
In other fields of research, the 


} applied 2ге the square permit- 
ting the elimination of positional effects оп a large agar plate. 

In fact a Latin square design may well be useful when we wish to 
remove from an analysis of data the effect ofa factor which we are not 
interested in, but which is known to be significant. If we are not careful 
to arrange our experiments so that the effect of the factor is separable 
from the other effects, we may find that we have confounded that effect 
with an effect in which we are interested. 
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Assumption in the Analysis of Latin Square 4 

The Latin Square model assumes that interactions between treat- 
ments and row and column groupings are non-existent. Since each 
treatment occurs only once in each row or column, if interactions are 
present, it is possible for them to cause an apparently significant difference 
between treatments. This is one of the reasons why it is important to 
choose rows and columns of a particular Latin Square in a random way. 
Interactions of present can then be viewed as random elements that are 
part of the error treatment. They blow up the error variance and make 
the test less efficient but their randomization still allows for a valid theo- 
retical test. 

Steps in Constructing Latin Square 

The construction of Latin Square involves the following steps : 

1. Compute the correction factor by squaring the grand total and 
dividing it by the number of observations. 

ў 2. Compute the total sum of squares by adding the squares of the 
individual observations and subtracting the correction factor. 

3. Compute the row sum of squares by adding the squares of 
the row sums, dividing by the number of items in a row, and subtrac- 
ting the correction factor. 

4. Compute the column sum of squares by adding the squares of 
the column sums, dividing by the number of items in a column, and 
subtracting the correction factor. 

5. Compute the ‘treatment’ sum of squares by summing the squares 
of the treatment sums dividing by the number of treatments, and 
subtracting the correction factor. 

6. Compute the remainder sum of squares by subtracting the sum 
of 3, 4 and 5 from 2. 

7. Enter these sums of squares in an analysis of variance table and 
compute the various mean squares. 

ANALYSIS OF VARIANCE TABLE 


Source of variance | S.S. Degrees of Mean square 
freedom 
Rows 
Columns 
Treatments 


Remainder (experimental) 
error plus interaction 


Total 


8. The last!step is to sum an F-test by comparing the treatment mean 
square with the remainder mean square. 
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Mlustration. 3. Five varieties of wheat, А, B, C, D and E were tried. The 
gross size of the plot was 18 feet X22 feet, the net plot beign 14 feet x 18 feet. Thus the 
whole experiment occupied an area 90 feetx110 feet. The plan, the varieties shown im 
each plot and yields obtained in kg. are given in the following table: 


B E [o A | D 

90 | 980 134 112 | 92°) | 
E D ВАТ нр 4 
3530 ы 70 | dar | 8 
CPUS E MS: aliod 
111 90 87 84. | 69 

M — - | | * 

AU Eh. Ses E D AARS 

ЕМ 4125 85 СЕИ Шу | 

= | — | —- | 
рев, А Е e 
82 | 60 94 892 1:08 


? Carry out an analysis of variance. What inference can you draw from the data 


Solution. Itis a 5x5 Latin square. Let us subtract 90 from each value, the 
table value will be as follows 9 


Columns бе for) Squares” 
2123 745-18 Row: 1 2 3 405 


1 0—10 +44 422 +2 +58 0 100 1936 484 4 
2-5. —6 —20 +51 <8 +12 25 36 400 2601 64 


3421 0 -3 -6-21 —9| 441 0 9 36 441 
ОБА 5 14 ——18 ке 81 1225 25 196 324 
ӨЙ? J 530 сч Ан жоро 41 64 900 16 25 4 


Xfor —1 —11 +20 +48 —47 +9 611 2261 2386 3342 837 
Columns 


5х2=611 +2261--23864-33424-837—9437 


DESIGN OF EXPERIMENTS А-6'9 
Treatment 1 2] 3 4 5 Total 
A —9 0 +4 +22 —8 +9 
B 0 —30 —20 —6 —18 —74 
c +21 +35 +44 +51 id +149 
р —8 —6 -3 —14 +2 7-29 
E m —10 =5 =) —21 —46 
N=5x5=25 " 


Total sum of squares—Zx?— К 


= 9437 — 


(9)* 


25 


=9437—3'24=9433'76 or 94338 
Total degrees of freedom=25—1=24 


(2) Sum of squares ‘‘Between columns” 
-L p124 (11) + 20) (авуч (40 T 


5 
12 
5 


_ 5035. 
= 344 


{1-+121+400+2304-+2209]—3'24 


=1007—3'24=1003'76 or 1003°8 


y=C—1=4 
(3) Sum of squares between rows 


5 


=L [(58)*+ 01224 (9) — (11) +41- ay 


T? 


—1[33644-144--814-121--1681]—324 


=} (5391)—3°24 


y-r—]-—4 


' (4) Sum of squares between treatments 
((9)*-- (742+ (149) (29) + (—46)1- ap 


5 


1078:2—324—1074'96 or'1075 


=1 [814-54764-22201 +841+2116]—3°24 


=} [30715]—324 


=6143—3'24=6139'76 or 6139'8 


Degrees of freedom between treatments 


=t—1 or 5=1=4 
(6) Residual Error or Remainder 


=9433'8 —1003:8—1075—6139'8—12152 
The results of analysis of the ab ove experiment are given below : 
THE ANALYSIS OF VARIANCE OF PLOT YIELD IN KG. 


Source y S.S. M.S. F ratio Fs 
observed 

Rows 4 10750 268'8 2:65 3°26 

Columns 4 1003'8 251°0 2:48 3°26 

Varieties 4 6139°8 1535°0 1515 326 

Error 12 12152 1013 a — 


varieties is very highly significant. 


bee TR 2o 5 
The F-ratios for rows and columns are not significant at 5% level while that for 


The fact that there are no significant differences 
between rows and columns shows that the Latin square arrangement has not been 
advantageous, 
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Randomized Blocks Vs. Latin Square 


The randomized block design is superior to the Latin square in 
many ways. The randomized block design is available for a wide range 
of treatments (2 to 24) and there is no restriction on the number of repli- 
cations. The analysis of variance is also more flexible. If there is an 
attack of some pest or disease in oae or two of the blocks the data for 
these blocks can bs easily omitted without any complication in the ana- 
lysis, while results from a. Latin Square experiment necessitate a much 
more complicated analysis under similar circumstances, In the field 
also the randomized block trial is easier to minage. It can be accom- 
modated equally well in a rectangular or square field or a field of any 
other shape, while for a Latin square trial it is necessary that the shape 
of the field should be approximitely square or rectangular. Waen there 
зге simultaneous trends of fertility variations in two directions at right 
angles (or What amounts to a diagonl tread in fertility), the Latin Square 
design is likely to be eflicient.* 


The Latin Square arrangement is suitable only in the special cases 
where theland exhibits mirked trends in fertility. This design, since 
it requires many replications as there are treatments, is suitable chiefly 
for 5 to 10] treatments. For comparison of a smaller number of treat- 
ments the number of replications is found to bz inadequate while for 
а larger number of treatments the number has to be unduly increased. 
For the small number of treatments, however, more than one Latin square 
may be laid out to secure adequate replication, 

LATIN CUBES 


The basic idea of a Latin Square can b2 extended to patterns in 
three dimensions or more, bat Practical applications of Latin Cubes and 
related designs are few. 

Factorial Experiment 


In an endeavour to improve the logical foundations of a scientific 
experimentation, factorial design has proved one of the most fruitful 
developments, Factorial experiments permit the experimenter to evaluate 


variable considered singly. For example, the factors affecting the growth 
and yield of а crop —manuring, seed rate, methods of cultivation, dates 


*Statistical Methods for Agricultural Workers: Panse & Sukhatme, 
tThe reason for this is that degrees of freedom for error should not be below 12. 


one experiment are : 

(i) To obtain information on the average effect of all the 
factors economically, from a single experiment of mode- 
rate size ; 

(ii) To broaden the basis of inference on one factor by testing 
it under varied conditions of others ; and 

(iii) To assess the manner in which the effects of factors inter- 
act one another. These are not entirely independent but 
the emphasis varies with the subject of experimentation. 

A detailed analysis of the factorial experiments is beyond the scope 
of this book. Those interested may please refer to the books suggested 
below;: 

SUGGESTED READINGS 


Finney: Experimental Design and its Statistical Basis. 


| В. A. Fisher : Statistical Methods for Research Workers. 
Cochran & Cox : Experimental Designs. 
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| phate. The three main reasons for including levels of several factors in 


Panse & Sukhatme: Statistical Methods for Agricultural Workers. 
Yates: The Design and Analysis of Factorial Experiments. 


4. Statistical Quality Control 


opem pork css к т т 


In this era of ever-growing competition it has become absolutely . 


necessary for a businessman to keep a continuous watch over the quality: 
of the goods produced. Having once bought the product, if the con- 


ever, if the Consumers are not happy with the quality of the product and 
their complaints are not given proper attention, it shall be impossible 
for the manufacturer to continue in the market. Either he would have to 


а Although the need for maintaining and improving quality standard 
is growing with increasing competition, the idea of quality control is not 
à new one. For centuries, highly skilled artisans have striven to make 
products distinctive through superior quality, and once a standard of 
quality was achieved to eliminate in so far as possible all variability be- 
tween products that were nominally alike. What is new about quality 


ture of a very large numbez of components. The idea that statistics might 
be instrumental in controllig the quality of the manufactured products 
goes Баски (о 1920's and 1930's but it was not until the pressure of the 
production needs developed during the Period of World War II that its 
value was fully appreciated. During that period, the use of this technique 


. It is Important to distinguish between the unsystematic inspection 
and supervision Which often goes under the name of “quality control”, 
and statistical quality control. The former does not Say when or how 
samples should be taken or how large they should be. Also it does 
not have the advantages that 80 with graphic presentation and a clear, 
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obejctive standard is not enforced for “take action" ог “skip it". The 
statistical quality control chart makes use of well thoughtout, tested rules 
and avoids the indecision, inconsistency and arbitrariness of haphazard 
quality control. Statistical control is based on the fact that repeated 
random samples from a fixed population will vary, but in a predictable 
pattern. 

The term ‘quality’ in statistical quality control is usually related to 
some measurement made on the items produced, a good quality item 
having one which conforms to standards specified for measurement. Quality 
does not always imply the highest standards of manufacture, for the 
standard required is often deliberately below the highest possible. 1 is 
almost always consistency in quality standards which represents the most 
desirable situation rather than the absolute standard which is maintained. 


The need for quality control arises because of the fact that even after 
the quality standards have been specified some variation in quality is 
unavoidable. For example, a machine is producing 100,000 bolts per day 
of 2” length. It is very unlikely that all the bolts are exactly 2” in 
length. If the measuring instrument is sufficiently precise we can detect 
some screws which are slightly less than 2” and some which are slightly 
more than 2”. This leads to a search in the possible causes of variation 
in the product. The variation of a quality characteristic can be divided 
under two heads : 

(i) Chance variation, i.e., variation which results from many minor 
causes that behave in a random manner. This type of variation is 
permissible, and indeed inevitable, in manufacturing. There is no way in 
which it can completely be eliminated—when the variability present ina 
production process is confined to chance variation, the process is said to 
be in a state of statistical control. 

(ii) Assignable variation, i.e., those variations that may be attributed 
to special non-random causes. Such variations can be the result of the 
several factors such as a change in the raw material, a new operation, 
improper machine setting, broken or worn parts, mechanical faults in 
plant, etc. 

Out of these two types of variation nothing can be done about the 
former type. However, assignable variation can be detected and corrected. 
The value of quality control lies in the fact that assignable variations in a 
process can be quickly detected ; in fact, these variations are often dis~- 
covered before the product becomes defective. 

There are two different ways of controlling the quality of a 
product : 

. (i) Through 100% inspection, ie. by inspecting each and every 
item that is produced ; and 

if) Through sampling technique or the use of statistical quality 
control. 

The system of 100% inspection is not very satisfactory because of the 
following reasons : 

(i) It is too expensive. 

(ii) It is not always reliable because it becomes too much a routine 
for the persons inspecting each and every item and defective pieces may 
also be labelled ‘satisfactory’. Defective pieces may also be passed at times. 


` 
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when distraction occurs, For example, even when an inspector is trying 
to perform his task conscientiously, when somebody talks to him or some- 
one else happens to attract his attention he may at times pass faulty 
pieces. 

(iii) The inspection is made at the end of the manufacturing cycle, 
and hence provides few controls over the manufacturing process. 


Thus we find that even 100% inspection is not infallible. In an 


every item produced and for indicating whether or not the variations which 
occur are exceeding normal expectations. The statistical control of quality 
makes use of the theory of sampling and tests of significance, 


` Quality control methods are applied to two distinct phases of plant 


(i) The control of a Process during manufacture. A process is said 
to be in a state of statistical control if the variation is such as would occur 
in random sampling from some stable population. Ifthis is the case, the 
variation among the items is attributable to chance and there is no point 
in seeking special causes for individual Cases. But when the process is 
out of control, it should be Possible to locate Specific causes for the 
variation, and removing them to improve the future performance of the 
process. Statistical quality control шау be applied to any repetitive 
process. Such processes are found not merely in machine production in a 
factory but also in many management problems. Statistical quality 
control methods have been used in connection with such diverse problems 
as the stamping of bottle caps, errors in the work of accountants, the 

ling of cartons, complaints received from customers, and airline 
reservations. The statistical tool applied in process control is the control 
chart. The primary objectives of process control are: (a) to keep the 


(й) The inspection of materials to determine their acceptability, 
whether they be in raw, semi-finished Or completed state. This is known 


to what is required of the product rather than by the inherent capabilities 
of the Process, as in process control, In process control the population 
is the infinite number of possible results from the same repetitive process. 
In sampling inspection the population is the finite group of items which 
have been produced, usually referred to as a lot. 


Control Charts 


A control chart is a statistical device principally used for the study 
and control of Tespective processes, 
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Dr. Walter А. Shewhart, its originator, suggested that control chart 
may serve, first, to define the goal or standard for the process that the 
management might strive to attain; secondly, it may be used as an ins- 
trument to attain that goal; and, thirdly, it may serve as a means of judg- 
ing whether the goal is being achieved. Thus it is an instrument to 
be used in specification, production and inspection and is the core of 
statistical quality control. 

A control chart is essentially а graphic device for presenting data 
so as to directly reveal the frequency and extent of variations from estab- 
lished standards or goals. Control charts are simple to construct and 
easy to interpret and they tell the manager at a glance whether or not 
the process is in control, i.e., within the tolerance limits. A control chart 
consists of three horizontal lines : 

(i) A central line to indicate the desired standard or level of the 
process ; 

(ii) Upper control limit ; and 

(iii) Lower control limit. 

A specimen of the control chart is given below : 


OUTLINE OF A CONTROL CHART 


QUALITY 
SCALE 
\ “OUT OF CONTROL 
сола Pee © Fee БЕБЕ: pee 
. 
| < e °  8-5IGMAS 
# Л 
AVERAGE E—————————— 
Fei a4 
| 3-SIGMAS 
| LI 
p €——Á——— (e ebd 


ШТ OF CONTROL 


| 
L 


ЙТ E E E ARAK 
SAMPLE (SUB-GROUP) NUMBER 


From time to time a sample is taken and the data are plotted on 
the graph. So long as the sample points fall within the upper and lower 
control limits there is nothing to worry as in such a case the variation 
between the samples is attributed to chance or unknown causes. 


., , It is only when a sample point falls outside the control limits that 
it is considered to be a danger signal indicating that assignable causes are 
bringing about variations. Thus there is no wastage of time and money 
in an effort to find the reason for random variation but as soon as an 
assignable cause is apparent, necessary corrective action is taken. 
Generally, if all dots are found between the upper and lower 
contro] limits it is assumed that the process is “їп control" and only 
chance causes are present. However, sometimes dots are found arranged 
in some peculiar way. Although they appear between the control limits, 
а substantial number of successive dots may be located оп the same side 
.of the central line or successive dots may follow a definite path leading 
towards the upper and lower controllimit. Such patterns of dots within 
control limit should also be considered as danger signals which may in- 
dicate a change in the production process. Thus control charts are not 
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only watched for points falling outside the control limits, they are also 
scrutinised for unusual patterns suggesting trouble. 

The control chart may be “likened to a highway whose control limits 
are the shoulders on one side and the centre line on the other. No car 
driving along the highway can maintain a perfectly straight path. Un- 
evenness in the road, play in the steering wheel, gusts of wind, and a 
host of other factors cause slight variations in the path of the car. It 
would hardly be worthwhile to investigate the causes of the small irre- 
gularities. However, the moment the car swerves outside one of the 
limits, an assignable cause can be assumed to exist and the investigation 
should begin. The cause mày turn out to bea defect in the steering 
mechanism, a sleepy driver or some similar correctable factor."* 

How to set up the Control Limits.+ The basis of control chart is 
the setting up of upper and lower control limits. These limits are used 
as a basis for judging the significance of the quality variations from sam- 
ple, lot to lot or from time to time. The moment a point falls outside these 
limits it is taken to be a danger signal. The control limits Serve as a guide 
for action and, therefore, they are also referred to as action limits. 
Control limits are established by computation based upon: 

(i) Data covering past and current production records. 


(i) Statistical formulae whose reliability has been proved in 
practice. 


In most control problems, it had been found satisfactory to place the 
control limits above and below the grand average of {һе statistical 
measure (X, с, R, etc.) that is being plotted at distances of three times a 
computed value, commonly designated as the "sigma" of the statistical 
measure, for sub-groups of the size under consideration. These are 
referred to as “3 sigma” limits.t The logic of drawing 3c limits is that in 
case of a normal distribution X--3s covers 99°73 per cent of the items. 
In other words, occurrence of events beyond the limits (¥+:3c), provi- 
ded the events lie on a normal curve, is on the whole nearly 3 out of 1,000 
events—an extremely remote chance under normal circumstances. Hence, 
if points fall outside 3-sigma limits they indicate the presence of some 
assignable cause—all is not due to tandom causes. It should be noted 
that if points fall outside 3-sigma limits, there is good reason for confi- 
dence that they point to some factor contributing to quality variation that 
can be identified. 


Par ea SEC 
Em *Divine William R. and Harvey Sherman: 4 Technique for Controlling Quality, 
p. 11. 

... TThe control limits should not be confused with specification limits. Specification 
limits are set by the designer of the items being produced, and may not have been set 
with knowledge of the process capabilities. £ 

Ht should be noted that this value of sigma is not the computed standard 
deviation of the plotted points. In the case of the X, R and c charts it is computed 
from the individual observed values with sub-groups ang the size п of a sub-group. 
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The selection of standard value ( Y, c, p, etc.) is probably the 
most basic problem encountered in setting up a control procedure. The 
primary aim is not just to get control, but to get control at a satisfactory 
level. A satisfactory selection depends fundamentally upon the needs of 
the buyer or user as defined by hisspecifications. Any questions of 
cost of production and capability of manufacturing process must also be 
taken into account in deciding on а level that will be economical from an 
average point of view. 

"Types of Control Charts 
Broadly speaking, control charts can be divided under two heads : 
(i) Control charts of variables, and 

(ii) Control charts of attributes. 

Variables are those quality characteristics of a product which are 
measurable and can be expressed in specific units of measurement such 
as diameter of rad o knobs which can be measured and expressed in centi- 
metres, tensile strength of cement which can be expressed in specific 
measures per square inch of space, etc. Attributes, on the other hand, are 
those product characteristics which are not amenable to measurement. 
Such characteristics can only be identified by their presence or absence 
from the product. For example, we may say that plastic is cracked or 
not cracked, whether the bottles that have been manufactured contain 
holes or not. Attributes may be judged either by the proportion of units 
that are defective or by the number of defects per unit. Thus the data 
resulting from inspection of a quality characteristic may take any one of 
the following forms : 

(i) A record of the actual measurements of the quality characteristics 
for individual articles or specimens. 

(ii) A record of number of articles or specimens inspected and of the 
number found defective. 

(iii) A record of the number of defects that are found in a sample. 
"The number of defects рег sample may be very large compared to the 
average number of defects per sample. 1 

For purposes of control, data of the first (i) form listed above may 
be summarized by taking two statistical measures, the average (X) and the 
standard deviation (с), or the average (X) and the range (R). Data of the 
second form (ii) can be summarized in terms of fraction defective (p), and 
DE od type (iii) can be summarized in terms of number of defects per 
unit (с). 

Setting up a Control Procedure 

In establishing basic procedures for the operation of a quality 

control programme, the manufacturer must take the following preliminary 


l. Select the quality characteristics that are to be controlled 
(including the limits of variation). 

2. Analyse the production process to determine the kind and 
location of probable causes of irregularities. 

3. Determine how the inspection data are to be collected and 
recorded, and how they are to be sub-divided. 

4. Choose the statistical measures that are to be used in the 
charts. 
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Depending on the type of inspection data available, any one of the 
following types of control charts may be used. 

1. Control charts for X and с and X and R. Such charts are used 
when measured values of the quality characteristics are at hand. 

2. Control chart for X alone. Control chart for Y alone is used 
ys experience with control charts for X and R, or X and c has 

emonstrated that instances of lack of control are almost always associ- 
ated with causes that affect X rather than c or R. 

3. Control chart for с or R alone. Control chart for R or a is 

used alone where technical reasons render control of X unimportant or 
where control for X is known to be unjustifiably expensive. 
- 4. Control chart for c. This chart is used in such situations where- 
in the inspection consists of determining the number of defects c in 
a sample. Such is the case, for example, in the examination of finished 
textiles, plywood sheets etc. 

5. Control chart for р orpn. Chart for р or pm is used when the 
records of inspection or testing show merely the number of articles 
inspected and the number found defective. 

These charts are discussed in detail in the following pages. 

X Chart. The X chart is used to show the quality averages of the 
samples drawn from a given process. The following values must first be 
computed before an X chart is constructed : 

1. Obtain the mean of each sample, i.e., X,, Y,, Y, etc. This is 
done by dividing the sum of the values included in a sample (ХХ) by the. 
number of items in the sample (n or sample size) 

ya 2k 
n 
.2. Obtain the mean of the sample means, ie. X. This is dome 
by dividing the sum of the sample means (EX) by the number of samples 
to be included in the chart. 
aes ЎЎ 
Number of samples 
Запе control limits are set at 
UCL-X +30 
LCL-X-—3cX 
ри 54 
where сӯ = ——— and o—d' R 
DE py ag Mee 
since R is à biased estimator of c and d’ is the correction factor. The 
values for d' are tabulated in the last Appendix at the end of the book. 

Therefore, the control limits are 

UCL-—X--A;R 

LCL=X—4,R 
н Illustration 1. A food company put juice i verti tain- 
ing 10 ounces of the juice. The mW ihe ien ded icu Dens аа (Ostely 
after filling for 20 samples are taken by a random method (at an interval of evoy 30 
minutes). Each of the samples includes 4 cans. The samples are tabulated in the follow- 


ingtable. The weights in the table are given in units of 0°01 ounces in excess of 10 
ounces. For example, the weight of juice drained from the first can of tke sample is- 
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10°15 ounces which is in excess of 10 ounces excess being 0'15 ounces (10°15—10=0'15). 
Since the unit in the table is 0'01 ounces, the excess is recorded as 15 units in the table. 
Construct an Y chart to control the weights of mango juice for the filling. 


Weight of each can 
Sample. (4 cans in each sample, n=4) 
I 15 12 13 20 
2 10 8 8 14 
3 8 15 17 10 
4 12 17 11 12 
5 18 13 15 4 
6 20 16 14 20 
" 15 19 23 17 
8 13 23 14 16 
9 9 8 18 5 
10 6 10 24 20 
11 5 12 20 15 
12 3 15 18 18 
13 6 18 12 10 
14 12 9 15 18 
15 15 15 6 16 
16 18 17 8 15 
17 13 16 5 4 
18 10 20 8 10 
19 5 15 10 12 
20 6 14 12 14 
Solution, ge 
CALCULATIONS FOR X CHART 
Weight of each can Total weight Sample| Sample 
Sample (4 cans in each sample n=4) of4cans Mean Range 
Number x УХ Au 
| 
(1) = (2) (3) (4) 
1 15 12 13 20 60 | 15°0 8 
2 10 8 8 14 40 | 100 6 
3 8 15 17 10 50 [4:12:54 01 9 
4 12 17 11 12 52 Оре 
5 18 13 15 4 50 12:5 14 
6 20 16 14 20 70 17:5 6 
7 15 19 23 | 17 74 me dos 8 
8 13 23 14 16 66 165 10 
9 9 8 18 5 40 100 13 
10 6 10 24 | 20 60 150 18 
и S i2 20 15 92 130 15 
12 3 15 18 18 54 135 15 
13 6 18 12 10 46 1rs 12 
14 12 9 15 18 54 135 ..| 9 
15 15 15 6 16 52 130 10 
16 18 ped 8 15 58 145 | 10 
17 13 1633 | eh) 2204 3854408 12 
18 10 20 | 8 10 44 | 120 12 
19 5 15 10 12 42 10°5 10 
20 6 ТАБА 14 46 | 115 8 
| | 
Total 2630 211 


SM-A—11:77-60 
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Calculation 
(1) The mean of Jeach sample Y is given in column (3). For example, X tor 
the first sample is SP —15. 
(2) The mean of the sample means X is obtained from column (3) as follows : 
ъ_ IX 26 T 
P= лу = “1315 
(3) The value of R is computed from the values of R shown in column (4). 
For example, the value of А for the first sample is computed as follows : 
R=20—12=8 
(4) The value of R., i.e., the mean of the values of R is obtained 
4, ХЕ 211 м 
R 20 = 20 =10°55 
(5) UC.L.=X+Ag R 
—1315-F0729X10'55 [the table value of 
Аз for n=4 is 0729] 
=13'15+7'69=20'84 app. 


L.C.L. 


=13'15—0°729 x 10°55 
—1315—7'69—5'46 app. 
Note that the values in the above 
ovnces in exzess of 10 ounces. 


computation are expressed in units of 0'01 
for L.C.L. is 10056 ounzes. 


, The actual value for the U.C.L. thus is 10'207 and that 
The control chart for this illustration is given below : 


X-CHART DRAINED WEIGHTS OF MANGO JUICE IN CANS 
5 (UNIT OF WEIGHT =-01 OUNCES М EXCESS OF 10 OUNCES} 
WEIGHT ol 


SAMPLE NUMBER(EACH SAMPLE INCLUDES 4 CANS) 


Since all the points are falling within control limits the process is in a state of 
control and hence there is nothing to worry. 

Illustration 2. А drilling machine bores holes with a mean diameter of 0°5230 
сш. anda standard deviation of 0'0032cm. Calculate the 2-sigma and 3-sigma upper 
and lower control limits for means of samples 4, and prepare a control chart. 

Solution. We have 


X —0:5230 cm., «070032 cm., n—4 
Ex0:15-:0:0032 +. 
л уп = 3 90016 
2-sigma limits for means of sample of 4 : 


U.C.L.— Y +2(oy”) 
U.C.L.—0'52304-2(00016) 
=0°5262 cm. 
Central line=0°5230 cm. 
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L.CL.=X—2(s)(V2) 

—0'5230—2(0'0016) —0'5198 cm. 
3-sigma limits for means of sample of 4: 

U.C.L.— X4-3(o] v/n) 

=0°5230+3(0°0016) =0°5278 cm. 
Central line=0°5230 cm. 

=L.C.L.=X—3(s/vn) 
=0°5230—3(0°0016) —0'5182 cm. 


CONTROL CHART 


(0:5278) в U.C.L. (3- SIGMA) 


(0:5262) A; 


(0:5230) | CENTRAL LINE 
[7 


.L. (2- SIGMA 
(05198) A' - а заң 


.C.L. (3- SIGMA) 
(0:5182) B' Er 


R Chart 

The R chart is used to show the variability or dispersion of the 
quality produced by a given process. R chart (or c chart) is the compa- 
nion chart to the X chart and both are usually required for adequate 
analysis of the production process under study. The R chart is generally 
presented along with the Y chart. The general procedure for construct- 
ing the R chart is similar to that for the Ў chart. The required values 
for constructing the R chart are : . 

l. The range of each sample, R di 

2. The mean of the sample ranges, К 

3. "U.C.D. and L.C.L. 

U.C.l.g = R 4-3eg and 
L.C.L.p =R— 306R 
where op =The standard error of the range. 

The value of ср may be estimated by finding the standard deviation 
of the ranges of the samples included in a chart. In practice, however, 
it is rather convenient to compute the upper and lower control limits by 
using the values D, and D, as provided in Table XII of Appendix 3 accor- 


ding to various sample sizes (n=2 to 20). When the tabulated values 
are used, the two limits may be written as follows : 


U.C.L.g —D,g- 
L.CL.g =D;R 
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It should be noted that the use of R chart is recommended only for 


relatively small sample sizes (rarely more than 12 or 15 units). For the 
large sample sizes (n>12) the c chart is to be preferred. я 


Illustration 3. Prepare an R chart for the data of illustration 1. 
Solution, The required values for the chart are : 

1. The range of each sample, R (please see column 4 of illustration 1) 
2. The mean of the sample ranges, К 


SEIT COMM 
Rey 71055 
3. UCL, =Dik 
=2'282(10°55) —2408 
LCL. g =: 
=0(10°55)=0 


The control chart for R is shown below 


R-CHART DRAINED WEIGHTS OF MANGO JUICE IN CANS 
(UNIT CF WEIGHT: -OI OUNCE IN EXCESS OF 10 OUNCE 5) 


UCL 236 


5 10 15 ә 
SABPLE NUMBER (EACH SAMPLE INCLUDES 2044S 


The chart shows that the process is under control since all the R values plotted 


on the chart are within the two control limits, 


The choice between the X chart and the R chart is a managerial problem. 


It is beiter to construct R chart first. If the А chart indicates that the dispersion of the 
quality by the process is out of control generally, it is better not to construct an X chart 
until the quality dispersion is brought under control. 


Illustration 4. The table below gives the (coded) measurements obtained in 20 


samples (sub-groups). Construct control charts based on the mean and the range. The 
values of these statistics are given below the respective samples. 


Sub-group E a EE 
Epos 4556 7.8 9 10.11.12 13 14 715 16.17 18 19.20 

—— 
Vou D 151—151: 91 Sako iret 11 pier 2 0 .3 
Pep Oma т 1-71 f е 140, 2-1. 10,23 
ОЕ (On 02.0. отто зр зу 17525317911 —1 
ООО т Ооо ао SP eG orig? ре руу s oT 9-1 
SI fono OE quide 019515799; 9 1106 02:2 €- 2:900 1:2 
WCG MISIT LUTEA тоноо pro POTN. TOUT TR ЖЕЕ 
ЕЗ. ТЕРРОР SE 37037 MANTEL 3:7 2597309 9 PO ease 76 
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Sit Bate X. 
m+ng+n3 «+ 


GEBE 6A LIO eper4roE2t4pero rino 


Solution, Y= 


Ед 4478-674 
20 
66 s. 
= 30-033 
1--1-3324443 33H F5 13:5 234-342 
R= +3+6 
20 
-8 „3 
= 735 


From the table for the sample of size 5, we find that* 
Ag=0'577, D3—0 and Dg=2'115 

Upper and lower control limits for X chart 
= X-EAsR 
=0'33-L0'577 (315) 
=0`33-1-1°818 

Lower control limit=0°33—1'818=— 1'488. 

Upper control limit=0°33+1°818=2'148 

Upper and lower control limits for R chart 


U.C.L.=D4R and L.C.L.—D3R 
U.C.L.—2115 (3°15) =6°662 
L.C.L.=0(3'15)=0. 
Matt The following two control charts are prepared with the help of the above control 
Its, ' 
* 


CONTROL CHART FOR X 5 


[ n 
+2= aoa bo fot feb UCL (2.148) 
MEASUREMENTS, 


Кра ы 
RIEN 


SU8-GROUPS 


к" * 49, Ds and Dg are given in the ASTM Manual Table, reproduced and presented 
atthe end ofthe text. It should be noted that when nis6 o: less, D,—0, hence the 
lower control limit for R is taken as zero. 
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CONTROL CHART FOR Ro 


+8—-— ткр ү 


| 
МЕЛЛА i ----UCL(6:662) 
T r7 
MEASUREMENTS | Н @ 
+4—-——— 


sz FRA UPS |R (315) 
аслии Ec IE eA e (0) 
| | ES ИС AMAN] 

үл ор | 


LAM 


The fact that in both graphs all sample points are falling within the 3-sigma 
control limits can be interpreted as implying that the process is in a state of statistical 
control or, in other words, that the only kind of variation present is chance variation. 


Control Chart for C (number of defects per unit) 


The C chart is designed to control the number of defects per unit. 
It is very popularly used in statistical work. 

Control chart for C is used in Situations wherein the opportunity for 
defects is large while the actual occurrence tends to be small. Such 
Situations are described by the Poisson distribution. This happens, for 
example, if we count the number of imperfections in a piece of cloth, the 
number of air bubbles in a piece of glass, the number of blemishes in a 
Sheet of paper, etc. Let C stand for the number of defects counted in 
one unit of cloth (paper, glass, rolls of wire) and C for the mean of the 
defects counted in several (usually 25 or more) such units of cloth, the 
n Line of the control chart for C is C and the 3-sigma control limits 
are 


UC.L=C+3VE 


L,C.L-C—34/6 
This formula is based on a normal curve approximation to the Poisson 
distribution, The use of the C chart is appropriate if the opportunities 
for a defect in each production unit are infinite but the probability of a 
defect at any point is very small and is constant, 


Uniform sample size is highly desirable while using the C chart. 
Where sample size varies particularly if the variation is large, the C chart 
becomes difficult to read, and the p chart (discussed just after the C chart) 
provides a better choice. 


Illustration 5, Asssume that 20 1-litre milk bottles are selected at random from 
а process. The number of air bubbles (defects) Observed from the bottles is given in 
?* he table on the next page: 
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t {c=No. of air bubbles (defects) in each bottle] 


Bottle Number Defects Bottle Number 
(Sampl e Order) c (Sample Order) 


Defects 
c 


DOVONDAN юк 
со мо ON CA чә чә сал d 
aa 
ae 


- 


РЧ 
о 
m 


Total number of defects 108 


Draw a control chart for the above data. i 

Solution, We will use the Cchart here. The computations required for 
preparing this chart are : 

(1) C, Le., average number of defects 


(2) U.C.L. 


=5+3ү5 
=5+3х2'236 
=5+6°708=11°708 


LCL-6-3 /c 
m5—345 
2=5—3х2'236 
=5 —6'708=— 1'708 

2 The control chart is given below : 


(3) L.C.L. 


C- CHART NUMBER OF AIR BUBBLES (DEFECTS) 
IN EACH BOTTLE 


DANGER SIGN. 
poa 


[1 % 
(SAMPLE NUMBER) 


The lower control limit will te r ecordcd as zero, since tte numter cf defects 
cannot be negative. 

It is clear from the chart that only one point in respect of last sample alls out- 
side the contro! Jimits and this is to be treated as a danger signal. 
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Illustration 6, Twenty pieces of cloth out of different rolls contained Tespecti- 
ЖУ, 154:3,27574,6, 7, 2:3: 0/409. 6,4,5,2,1, 3, and 8 imperfections. Ascertain 
whether the process is in a state of statistical control. 


Solution, 
Ga 14+4+3+2+... 80 
C= JE жол г stop! =4 
Upper and lower control, limits for the number of imperfections per unit of 
cloth are: 
UCL.=C43V 6 
=443V/4=446=10 
LCL.—6-34/ с 
=4—3х2=—2 


The control limits are shown in the following chatt : 


121 
6. B 91714 i. 
ROLLS OF CLOTH 


___ Since none of the points lies outside the control limits itmay be presumed that 
the process is in a state of Statistical control, 


Tlustration 7. The following table gives the number of errors of alignment 
Quer at final inspection of a certain model Ofbus. Preparea С chart and comment 
on it, 


Bus Number Number of alignment Bus No. No. of alignment 
defects defects 

1001 6 1011 
1002 10 1012 é 
1003 8 1013 10 
1034 7 1014 10 
1005 12 1015 6 
1006 9 1016 1. 
1007 S 1017 3 
1008 7 1018 п 
1009 3 1019 2 
1010 4 1020 1 
Solution, 


С, i.e., average number of defects 40 =7 
The control limits and the central line are; 
' U.C. L=} yF | 


VHS 7-4: (0X 2646) ==14°938 ог 15 
Central line =G_7 
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L.C.L.-C —34/C 


=7— (3 x 2646) = —0'938 
Because the number of defects cannot be negative, the lower limit will be taken 


аз zero. 
CONTROL CHART OF NUMBER OF DEFECTS 


i UCL=15 
hema arr ranr AOI 20112 
He 14 
rs 12- ° o 
ш 
10; e о 
e 
= 04 о ° б=7 
Ss 
5 бү» 2 е e 
5 e 
4 
. ° 
24 ^ о 
богы ый E мара ать 
1000 1052 1004 1006 1098 1010 1012 ЮА 1016 1018 1020 
BUS NUMBER 


Control Chart for p (Fraction Defective) 

The p-chart is designed to contro] the percentage ог proportion of 
defectives per sample. Since the number of defectives (c) can be converted 
into a percentage expressed as а decimal fraction merely by dividing c by 
the sample size, the p chart may be used in place of the c-chart. The 
p-chart has at least two advantages over the c chart : 

1. Expressing the defectives as a percentage or fraction of pro- 
duction is more meaningful and more generally understood than would be 
the statement of the number of defectives. The latter concept must be 
related in some way to the total number produced. 

2, Where the size of the sample varies from sample to sample, the 
p-chart permits a more straightforward and less cluttered presentation. 
The p- chart requires, however, that the division c/n be made. This 
additional computation may be regarded as a slight disadvantage. 

The same basic data is used for both c as well as p chart. When the 
sample size remains constant from sample to sample, the primary differ- 
ence lies in the computation of the control limits. The c-chart control 
limits are set at c plus or minus three standard deviations. The p-chart 
control limits are set at p plus or minus three standard errors of the 
proportion. 

This chart has its theoretical basis in the binomial distribution, and 
generally gives best results when the sample size is large, say, at least 50. 
The steps in constructing the chart are: 

(i) Compute the average fraction defective (p) by dividing the number 
of defectives by the total number of units inspected. 

(ii) On the chart draw a solid horizontal line to represent p. 

(iii) Determine the upper and lower control limits, The upper and 
lower control limits are obtained by the average per cent defective plus 
and minus three times the standard error as follows. 


vorp [ACP ; Le L=P- syy 0059) 
n n 
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While constructing the chart it is generally preferred to express results in 
terms of “рег cent defective’ rather than ‘fraction defective. The per cent 
defective is 100 p. Any sample point falling outside the control limits is 
evidence of a possible lack of controlinasmuch as the probability of 
getting such value by chance is less than 0'003. The following example 
shall illustrate the procedure : 


INSPECTION DATA ON COMPLETED SPARK PLUGS 
(2,000 spark plugs in 20 lots of I00 each) 
Fraction 


Lot Number Number Defectives | Lot Number Number Fr action 
Defectives Defectives Defectives 
1 5 0'050 11 4 0'040 
2 10 0:100 12 7 0:070: 
3 12 0'120 13 8 0'080 
4 8 0'080 14 2 6020 
5 6 0'060 15 3 0'030 
6 5, 0:050 16 4 07040 
LORN 6 0'060 17 5 0'050 
8 3 07030 18 8 0'080 
9 3 0030 | 19 6 0'060 
10 5 07050 20 10 07100 


Total 120 


Construct an appropriate control chart. 


Us Solution, Since we are given fraction defectives, the suitable chart will be p- 
chart. 


Calculations for p-chart are : 
(1) Average fraction defective, 


T 
e р =F 9097006 


(2) UA 202A 


TSR У 
i 0'06(1— 0706) 
0706 
IVA 103 
у ЫБ, ат 
100 
) 


770'06--3(0:0237. 
=0'06+0°0711=0'1311 


(3) Des mV Vig 


n 


= 006—3 ['0°06(1—0°06ў 
x 100 


d = 006—00711——0111 
bas Since the fraction defective cannot be negative, the L.C.L.shall be taken as 
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P-CHART FOR SPARK PLUG INSPECTION 


үн (20 LOTS OF 100 SPARK PLUGS EACH) 


14 
КАСТЮМ No se eased + . 
IDEFECTIVE 12 G Б i 


40 


08 


aa 
0 2 ц 
LOT NUMBER 


The control chart shows that all the points are falling within control limits. 
Hence the process is in a state of control. 

In order to simplify the work of the person who plots the necessary 
points on the control charts the above chart can be modified so that he 
can directly plot the number rather than the fraction or percentage of 
defectives. Such a chartis called the Control Chart for number of 
defectives. To obtain such a chart the central line as well as the control 
limits are multiplied by п. The central line thus becomes np and the 
control limits 

np-E3V np(1—p)- 

Illustration 9. The following data refer to visual defects found at inspection of 

the first 10 samples of size 100. Use the data to obtain upper and lower control limits 


for percentage defective in samples of 100. Represent the first ten sample results in 
the chart you prepare to show the central line and control limits : 


Sample No. П ИГ И KM о ADS | Total 
Novofdefectives 2 1 1 3 2 9 d 22, 2 0020 
(M. Com., Meerut, 1973) 
Solution, 


Since there are 20 defective items in 10 samples each of size 100, therefore 


= 20 
p=Average fraction defective=79 100—002 

Also, n=100 

д пр=100х0'02=2 И. 
Мпр{ї—р) — V.100X002x0:98—V/1'96—14 
U.C.L.—np--3V/np(1—p) -28- (5x 14) - 62 


and 


Central line=np=2 баш! 
L.C.L.=np—3v np(1—p) =2—3(14)=—22 
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3 CONTROL CHART FOR NUMBER OF DEFECTIVES p 


оу 


No. OF DEFECTIVES > 


4 7 


CM 


5r 6 


SAMPLE No.——> 


Iijustration 10. Construct a control chart for the proportion of dc. уез 
obtained in repeated random samples of size 100 from a process which is considered to 
г under control when the proportion of defective pis equal to 0:20. Draw the control 
line and the upper and the lower control limits on graph paper. 
(M. Com., Meerut, 1974) 
Solution. We are given 
p=Average fraction defective 0'20 
n=100 
1020080: e N 
uv; — 100 = /00016—0:04 


- Bun 
UCL-pt$. JEU 2L. 204.3(004)20:32 
Central line=p =0'20 


LC.L.=p—3 / РО) 20-3004) 0:08. 


CONTROL CHART FOR PROPORTION DEFECTIVES 


(0-32) A et 


(0-20) | CENTRAL 
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Illustration 11. A new shearing machine is set to cut offa piece of steel from a 
long bar. For various reasons the machine at times cuts off a piece that is too long or 
too short. These unacceptable pieces are automatically dropped in a box and the 
operator of the shearing machine must count these defectives after every 100 pieces are 
sheared off. The record after the first day of operation is : 


Number Sheared off Number Defectives 
100 5 
100 6 
100 7 
100 4 
100 8 
Set out upper and lower control limits, (B.A., Bombay, 1973) 
Solution. 
UPPER & LOWER CONTROL LIMITS 
Number Number Per cent 
sheared off. Defectives Defective 
100 5 0'05 
100 6 0'06 
100 7 007 
100 4 0:04 
100 8 А; 0'08 
bare) Д 0°30 
- 030. 
р= 75 =0`06 
/ 


т=р+3 „у! РАР) 
UCL=p+ v 3 
dis [ 906(1—006) 
—00643./— 199 — 
—07064-3(07024) 
=0`06+0`072=0`132 


LeL=p—3y/ PUM) 
n 
—06—0:072——0:012 
Hence the upper control limit is 0'132 and the lower control limit is zero, since 
there cannot be less than zero per cent defective. 
Advantages and Limitations of Statistical Quality Control 


Advantages. Statistical quality control is one of the tools of scientific 
management. It has several advantages over 100 per cent inspection. 
These are : i 

(i) Reduction in costs. Since only a fraction of output is inspected, 
costs of inspection are greatly reduced. 

(ii) Greater efficiency. Not only there is reduction in costs but the 
efficiency also goes up because much of the boredom is avoided, the work 
of inspection being considerably reduced. 

(iii) Easy to apply. An excellent feature of quality control is that 
it is easy to apply. Once the system is established, it can be operated by 
persons who have not had extensive specialized training ora highly 
mathematical background. It may appear difficult only because the 
statistical principles on which it is based are unrecognised or unknown. 
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However, as these principles are actually based on common sense, the 
quality control method finds wide application. 


(iv) Early detection of faults. Quality control ensures an early 
detection of faults and hence a minimum waste of reject production. The 
moment a sample point falls outside the control limits itis taken to be a 
danger signal and necessary corrective action is taken. On the other hand, 
with 100 per cent production unwanted variations in quality may be 
detected at a stage when a large amount of faulty products have already 
been produced. Thus there would be a big wastage. Control chart, on 
the other hand, provides a graphic picture of how the production is 
proceeding and to tell management where not to look for trouble. 

(v) Adherence to specifications. Quality control enables a process to 
be brought into and held in a state of statistical control, i.e., a state in 
which variability is the result of chance causes alone. So long asa 
Statistical control continues, specifications can be accurately predicted for 
the future, which even 100 per cent inspection cannot guarantee. Conse- 
quently, itis possible to assess whether the production processes are 
capable of turning out products which will comply with the given set of 
specifications. 

(vi) The only course. In certain cases 100% inspection cannot be 
carried out without destroying all the products inspected ; for example, 
testing breaking strength of chalks, proofing of ammunition, etc. In such 
cases if 100% inspection methods are followed then all the items 
inspected will be spoiled. In such a case sampling must be resorted to. 
The application of SQC techniques ensures not only that the quality is 
controlled but also that valid inferences about the total output are drawn 
from the samples. 

(vii) To determine the effect of change in process. With the help of 
control charts one can easily detect whether or not a change in the pro- 
duction process results in a significant change in quality. 

(viii) Statistical quality control ensures overall co-ordination. 
Statistical quality control provides a basis upon which the difference 
arising among the various interests in an organization can be resolved. In 
some instances, for example, production engineers may set specifications 
that are so "tight" that the operating staff cannot meet them economically 
and consequently there is an unnecessary high scrapping rate. In other 
instances , the specifications may be too loose, and product quality will be 
sacrificed unnecessarily. In either type of case, the control records 
provide a valuable aid in solving the problem of getting the operating and 
engineering forces together on the basis of common understanding. Infor- 
mation on plant capabilities and customer requirements must also be 
considered in relation to the quality control limits and records of 
performance and finally, it should be possible to determine the best 
practical balance between the cost of quality and the sales value of the 
product. 

SQC has a special role to play in a country like India because of 
the extraordinary variations encountered in raw materials and in 
machines. The importance of applying SQC has become greater in our 
industries in the context of the need for earning foreign exchange by 
upplying quality goods to successfully compete in the world markets, 
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Limitations 

Despite the great significance of statistical quality control, it should 
be remembered that it is not a panacea for all quality evils. The techni- 
ques of quality control should not be used mechanically rather they 
should be matched to the process being studied. The application of 
standard procedures without adequate study of the process is extremely 
dangerous, and has in the past led to statistical methods being discredited. 
Statistical methods applied on a production process are only an infor- 
mation service, and as such must be conditioned by the process to which 
they are applied. Unless they are used as part of a general quality aware- 
ness they may only lead to a false sense of security. The responsibility 
for quality and process decisions rests with the manager incharge of the 
process and not with the statistician. The charts do not reduce the mana- 
ger's responsibility. 

ACCEPTANCE SAMPLING 


The control charts described above cannot be applied to all types 
of problems. They are useful only for the regulation of the manu- 
facturing process. Another important field of quality control is accept- 
ance sampling. Inspection for acceptance purposes is carried out at 
many stages in manufacturing. For example, there may be inspection 
of incoming materials and parts, process inspection at various points 
in the manufacturing operations, final inspection by a manufacturer of 
his own product, and ultimately inspection of the finished product by 
опе ог more purchasers. Much ofthe acceptance inspection is carried 
out on a sampling basis. The use of sampling inspection by a purchaser to 
decide whether or not to accept a shipment of product is known as accep- 
tance sampling. A sample of the shipment is inspected and if the number 
of defective items is more than a stated number, known as the acceptance 
number the shipment is rejected. The standards in acceptance inspection 
are set according to what is required of the product, rather than by the 
inherent capabilities of the process, as in the process control. The pur- 
pose of acceptance sampling is, therefore, whether to accept or reject 
a product—it does not attempt to control quality during the manufac- 
turing process, as do the techniques described earlier in the chapter. 
Sampling inspection may also be referred to as product control, beeause 
it is designed to provide decision procedures under which a lot will 
be accepted or rejected. 


Acceptance sampling procedures which were perfected during World 
War П to meet military needs for quick and accurate inspection of vast 
supplies or material, are now used widely in industry. А typical appli- 
cation of acceptance inspection is to determine whether a batch of 
items, called an inspection lot or simply a lot, that has been delivered 
by a supplier, is of acceptable quality. Another application is to a lot 
that is complete and ready for shipments to make sure that it is of 
adequate quality. Still another application is in case of partly completed 
material, to determine whether the Jot is of sufficiently high quality to 
justify further processing. 


Role of Acceptance Sampling 
Acceptance sampling is very widely used in practice because of the 
following reasons : 
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1. Acceptance sampling is much less expensive than 100 per cent 
inspection. 

2. Та many cases it provides better outgoing quality. It is generally 
agreed that good 100 per cent inspection will remove only about 85 to 
95 per cent of the defective material. Very good 100 per cent inspection 
will remove 99 per cent of the defective items but still not reach 100 per 
cent. Because of the effect of inspection fatigue involved in 100 per cent 
inspection, a good sampling plan may actually give better quality assur- 
ance than 100 per cent inspection. The word ‘good’ is italicized since 
many informal sampling plans devised without knowlege of the laws of 
chance are practically worthless. Hence there isa need for devising an 
appropriate acceptance sampling plan. 

3. In modern manufacturing plants, acceptance sampling is used 
for evaluating the acceptability of incoming lots of raw materials and 
parts at various stages of manufacture, and final inspection of finished 
product. 

4. Where quality can be tested only by destroying items, as in 
determining the strength of glass containers, 100 per cent inspection is 
out of question and sampling must be used. Ofcourse, there are situa- 
tions where 100 per cent inspection is not to be put aside ; for example, 
in testing rifles to be used by soldiers, we cannot risk imperfection in any 
item and therefore must test each and every rifle. 

Since under a sampling inspection plan a decision is made as to 
whether to accept a lot or reject a lot on the basis of a sample, there is 
a possibility of (1) rejecting a lot as unsatisfactory when it is of accept- 
able quality, and (2) accepting a lot as satisfactory when in fact it is 
below the quality level. Hence in any acceptance sampling plan the 
producers and the consumers, the sellers and the buyers, are exposed to 
some risks which are called producers and comsumer's risks. The 
producer's risk is the risk a producer takes that a lot will be rejected 
by a sampling plan even though it conforms to requirements. This is 
equivalent to the concept of type I error, or the probability of rejecting 
a hypothesis when it is in fact true. The consumer's risk is the risk that 
a lot of certain quality will be accepted by a sampling plan. It is equi- 
valent to type IT error which is the probability of accepting a hypothesis 
when an alternative is true. Before agreeing to an acceptance criterion the 
consumers and producers will like to know the risks to which they are 
exposed, i.e., the probability of rejecting a good lot and the probability 
of accepting a bad one. 

An inspection plan can easily be constructed if the consumers and 
producers specify these probabilities and also the proportion of defectives 
above which a lot is considered to be bad and the proportion of defectives 
below which a lot is considered to be good. 

Types of Acceptance Sampling Plans 


The following three types of acceptance sampling plans are 
commonly used : 

l. Single Sampling Plan. When the decision whether to accept а 
lot or reject a lot is always made on the basis of only one sample, the 
acceptance plan is described as a single sampling plan. This is the 
simplest type of sampling plan. In any systematic plan for single sampling 
three things are specified, namely : (a) Number of items N in the lot from 
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which the sample is to be drawn. (b) Number of articles п in the random 
sample drawn from the lot. (c) The acceptance number c. This acceptance 
number is the maximum allowable number of defective articles in the 
sample. More than this will cause the rejection of thelot. Thusa 
sampling plan may be specified in this way 

N=200 

n=20 

c=1 

These three numbers may be interpreted as saying “Take a random 
sample of 20 from а lot of 200. If the sample contains more than 1 
defective, reject the lot ; otherwise accept the lot.” 

2. Double Sampling Plan. In the single sampling plan discussed 
above decision with regard to acceptance or rejection of a lot is based on 
the evidence of only one sample from the lot. However, double sampling 
involves the possibility of putting off the decision on the lot until a second 
sample has been taken. A lot may be accepted at once if the first sample 
is good enough or rejected at once if the first sample is bad enough. If 
the first sample is neither good enough nor bad enough, the decision is 
based on the evidence of the first and second sample combined. Ina 
double sampling plan 5 things are specified : т, с, п», п, +n, and с. 

n,—Number of pieces in the first sample. 

c,— Acceptance number for the first sample (the maximum number 
of defectives that will permit the acceptance of the lot on the 
basis of the first sample). 

n,—Number of pieces in the second sample. 

п, -n4— Number of pieces in the two samples combined. 

с, = Ассеріапсе number for the two samples combined (the maxi- 
mum number of defectives that will permit the acceptance of the 
lot on the basis of the two samples). 

Thus a double sampling plan may be : 


€3—4. 

This will be interpreted as follows : 

(i) Inspect a first sample of 20 from a lot of 500. 

(ii) Accept the lot on the basis of the first sample if it contains 1 
defective. 

(iii) Reject the lot оп the basis of the first sample if the sample 
contains more than 1 defective. 

(iv) Inspect a second sample of 60 if the first sample contains 2, 3, 4 
defectives. 

(у) Accept the lot on the basis of combined sample of 80 if the 
combined sample contains 4 or less defectives. 

(vi) Reject the lot on the basis of combined sample if the combined 
sample contains more than 4 defectives. 


SM-A—11°77-61 
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Advantages of Double Sampling Plan 

A double sampling plan has two possible advantages over a single 
sampling plan : 

(i) It may reduce the total amount of inspection. The first 
sample taken is less than that called. for under a comparable single 
sampling plan and, consequently, in all cases in which a lot is accepted or 
rejected on the basis of first sample, there may be considerable saving in 
totalinspection. Itis also possible to reject a lot without completely 
inspecting the entire second sample. 

(ii) A double sampling plan has the psychological advantage of 
giving a lota second chance. To some people, especially the producer, 
it may seem unfair to reject a lot on the basis of a single sample. Double 
sampling permits the taking of two samples on which to make a decision. 

3. Multiple or Sequential Sampling Plan. Just as double sampling 
plans may defer the decision oh acceptance or rejection until a second 
sample has been taken, other plans may permit any number of samples 
before a decision is reached. Plans permitting from three up to an 
unlimited number of samples are described as multiple or sequential. 
However, such plans are quite complicated and rarely used in practice. 
Selection of a Sampling Plan 

All practical sampling plans have an operating characteristic curve, 
briefly called OC curve. The following points need emphasis regarding 
the OC curve : 

]. There is some chance that good lots will be rejected. 

2. There is some chance that bad lots will be accepted. 

3. These risks can be calculated by the theory of probability and 
depend on the number of samples inspected, the acceptance number, and 
the per cent defective in the lots submitted for sample inspection. Given 
the amount of risks which can b? tolerated, a sampling plan can be 
devised to meet these requirements. 

4. The larger the sample used in sample inspection, the nearer the 
OC curve approaches the ideal. However, beyond a certain point, the 
added cost in inspecting a larger number of parts far exceeds the benefits 
derived. 

In review, the two parameters of an OC curve are the sample size 
and the acceptance number. The desired quality levels (p) and the pro- 
bability of acceptance (P;) must be selected so that the proper sampling 
plan can be designed. 

There are four factors which should be decided in a sampling plan : 

1. P, also known as AQL (the Acceptable Quality Level) This 
is the definition of a good lot. 

2. Р,, also known as ROL (the Rejectable Quality Level) or LTPD 
(Lot Tolerance Per cent Defective). 

3. æ, also known as Producers Risk. This is the probability of 
rejecting a good lot. 

4. B, also known as the Coasumer’s Risk. This is the probability 
of accepting a poor lot. 

Construction of an OC curve 

An OC curve can be determined by using either the Poisson distri- 

bution or the Thorndike chart. The Poisson distribution can be used 
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in all situations where p is less than 0°10 (or if the pz is less than 5) and 
the lot size is at least 10 times the size of the sample. 

Ina situation in which these conditions are not met, the theoretically 
correct approach is to use the binomial or the hypergeometric distribu- 
tion, However, for most industrial situations the Poisson distribution 
can be used without serious loss of accuracy. 

To use the Thorndike chart, the following procedure is followed. 
For each possible value of the lot fraction defective p, a pn is 
computed. The Thorndike chart is used to find the probability of 
C or less defective units. For example, for a lot that is 5 per cent 
defective (p—0'05) and a sample size of 100 (n—100), ie. рп=5, 
the probability of selecting 2 or less defectives is found from the Thorn- 
dike chart to be approximately 0°12. If the lot fraction defective is 1 per 
cent (p—0'01) and the sample size is 100 (n=100), i.e., pn—1 the proba- 
bility of selecting 2 or less defective is found from the Thorndike chart 
to be approximately 0:92. These results give two plots on the OC curve 
for the sampling inspection plan where the sample size is 100 and the 
acceptance number is 2. Other points may be calculated in the same 


way. 
The Operating Characteristic (OC) Curve 

In judging various acceptance sampling plans it is desirable to com- 
pare their performance over a range of possible quality levels of submitted 
product. An excellent picture of this performance is given by the ope- 
rating characteristic curve. Such curves are commonly referred to as 
OC curves. The OC curve of an acceptance sampling plan shows the 
ability of the plan to distinguish between good and bad lots. For any 
given fraction defective p in a submitted lot, the OC curve shows the 
probability p, that such a lot will be accepted by the given sampling plan 
or in other words the OC curve shows the long-run percentage of sub- 
mitted lots that would be accepted if a great many lots of any stated 
quality were submitted for inspection. In drawing the OC curve, the 
following two terms are important: 
AQL and LTPD 

In order to measure the customer's risk we must define maximum 
percentage of defective items in lots which the consumer wishes to accept. 
This is called the Lot Tolerance Percentage Defective or LTPD. Similarly, 
to measure the producer’s risk we define a minimum percentage of defec- 
tive items in a lot below which the lot should be accepted ; this is known 
as Acceptable Quality Level or AQL. The producer's risk is now defined 
as the probability that a lot having the AQL will be rejected and the 
consumer’s risk as the probability that a lot having LTPD will be accepted. 
These risks are usually taken as 5% and 10% respectively. The actual 
levels of the AQL and LTPD must be decided by negotiations between 
the consumer and the producer. 
Shape of an Ideal OC Curve 

The ideal OC curve would be one for which all good lots are 
accepted and all bad lots are rejected. Such a curve would look like the 
one given on the next page. 

No sampling plan can have an OC curve of this type. The degree 
to which an actual OC curve approximates the ideal curve depends upon 
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п and c, n representing the sample size and с the acceptance number, ог 
the number of defects in the sample which is not to be exceeded. 
Shape of a Typical OC Curve 


A typical OC curve resembles the following diagram : 
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OC curve for a typical sampling plan. 

, The points on the horizontal scale represent possible lot or process 
qualities, and the height of the curve shows the probability that a lot of. 
this quality will be accepted, assuming the specified sampling plan is in use. 

| the above diagram it has been assumed that the acceptable and 
Tejectable qualities are measured as proportions of the items that are 
defective and are Pa—0'05 and Pr=0'15 ; from the OC curve, the 
producer's and consumer's risks are seen to be both a little more than 
0710 in this example. (The sampling plan of the above diagram calls for 
accepting the lot if three or fewer defectives are found in a sample of 40.) 

The steepness of the OC curve depends upon the sample size. The 
larger the sample, the steeper the curve, and the smaller the zone between 
the qualities that аге almost always accepted and the qualities that are 
almost always rejected. 

The location of the OC curve is determined by the maximum number 
of defective items allowable for acceptance, called the acceptance number. 
If the acceptance number is made large, the curve is shifted to the right. 
If the acceptance number is made smaller, the curve is shifted to the left. 
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Illustration 12, Fora sampling plan N=1,200, n=64and C= 1, determine the 
probability of acceptance of the following lots : 
(i) 0'5% defective 
(ii) 0875 defective 
(iii) 1% defective 
(iv) 295 defective 
(v) 4% defective 
(vi) 1075 defective 
Also draw an OC curve. 
Solution, 
If the lot is 0575 defective the samples from it will also have an average of 054 
defective. Hence ina sample of size 64,the average number of defectives will be 


264х051 =0'32. Ifthe sample contains 1 ог 0 defectives, the lot is to be accepted 


under the sampling plan. We can obtain the cumulative probability of drawing a sample 
of 64 containing 0 or 1 defective by using the Poisson approximation to the binomial 
distribution. The calculations will be as follows : 


S.No.  9defective Average 
in the lot number of P(0) P(1) P(0)+P(1) 
defectives Pia) тт 
(а) 05 a 5.20320 0730 023 0960 
(b) оз HOS os 060. ©з! 0910 
(©) 1 SA —оею 0530 035 0:880 
(d) 2 Ax? =1'280 0280 036 0640 
(e) 4 Ue orsé) 0080 021 0:290 
(0 10 8510.640 0002 001 0012 
OG. CURVE FOR SINGLE 
10 SAMPLING 
08 
0-64 
Pa) 
041 = 
0:24 
QUO ЕЕ ЕА ee ame aT 
PEZ IRA 5 697 «8.9. 10 


P-PERCENTAGE DEFEC TIVE IN THE LOT 
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The value 0°96 represents the probability of drawing a sample of 64 with 0 or 1 
defective from a lot known to Бе 0°5% defective. Conversely, we can state that such a 
sample will enable acceptance of 96 per cent of lots containing 0'5 per cent defectives. 
In other words, if 1,000 such lots are submitted for inspection under the sampling plan, 
on an average 960 lots will be accepted and 40 will be rejected. 


If we take the probabilities of acceptance (Pa) on the Y-axis with percentage 
defective in the lots submitted on the X-axis, and join the various points, the curve so 
formed is known as the operating characteristic (OC) curve of the Sampling plan. From 
the OC curve, we can easily obtain the Probability of rejection of the lot, (1—Pa) will 
give the probability of rejection corresponding to any lot, having a specified proportion 
or percentage defective. 


Illustration 13. Draw an OC curve of the double sampling plan given that 
N=1000, m=50, c1 —1, лә==25, co=2, (M. Com., Bombay, 1974) 


Solution, This sampling plan means that a sample of size 50 is drawn, if it 
gives 0 or 1 defective it is accepted. Ifitgives 3 or more defectives it is rejected. But 
if it gives 2 defectives, then a sample of 25 is drawn. Ifthe total number of defectives 
is 2, the lot is accepted ; if it is more than 2 the lot is rejected. For various values of 
percentage defective in lot x, the probability of acceptance shall be obtained as follows : 

P(0)—e7", p(1) e^ x m, etc. 


Ist sample ny—50 2nd sample ny=25 Combined 
f е У A _ Sample s 
| p(a) 
х/т) р(0) | p) | pla) | p(2) |m | p(0) |p(2)3P(0) р(а) 
0| 0| 1000} 0000, 1'000 070 | 0'000 07000 1'000 
2| 1| 0368 | 0368 | 07 05 | 0:606 0115 0'851 
4| 2| 0135 | 0271| 04 Го | 0368 0:100 0:506 
6| 3| 0050| 0149 | 0199 15 | 0223 0'049 07248 
8| 4| 0018 | 0073| 009 20| 0135 0:020 0111 
10| 5, 0:007 | 0033 | 004 2'55| 0082 0007 07047 
12| 6| 0002| 0015 | 0017 | 30| 0050 0`002 07019 


Let us plot these points on the graph paper to get the required OC curve, 
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MISCELLANEOUS ILLUSTRATIONS 


Illustration 14 (а). Based on 15 sub-grou З Р 

i е Ips each of size 200 taken at intervals of 

i 5088. & TEE manufacturing process, the average fraction defective was found to 
+ Calculate the value of central line and upper and lower control limits. 


Solution. Central line=p=0'068 
[068(1—:068j 
V 15x200 


—0068 4--0138—:0818 
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LCL=p-3 pu =P) —0:068—00138—0:0542 
n * 

(b) A machine is designed to produce ball bearings having a mean, diameter of 
0:574 cms and a standard deviation of 0'CO8 cm. То determine whether ihe machine is 
in proper working order, a semple of 6 ball bearings is taken every 2 hours on all the 
working days (namely Monday to Friday) of the week and the mean diameter is com- 
puted from this sample. Design a rule whereby one can be fairly certain that t 
quality of the products are conforming to required standards. (I.C. W.A., 1976) 

Solution. Given mean diameter, ic, u- 0574, S.D. (c) 0008 and size of 
sample ѓе. п= 6. Mean + 3c cover 99739, of the observations. Hence with 99/7375 
confidence, the sample mean X must lie in the range p—3o/V/n tou 3o/ V n. Thus the 
required limits are : 


0057.02 or between 0:564 and 0584 cm. 


Illustration 15. The following data shew the values of sample mean X and the 
range К for the samples of size 5 each, Calculate the values for central line and 
control limits for mean-chart and range chart and determine whether the process is in 
control. 

Sample No. 1 2 3 4 S 6 7 8 9 10 
Mean (X) 1r2 118 108 116 1r0 9'6 104 96 106 10:0 
Range (R) 7 4 8 5 7 4 8 4 7 9 


(Ccmersicn fzctors for n=5 ere 4: 0:577, ръ= 0 ard D,— 2:115) (1 C.W.A. 1976) 
Solution. 
DETERMINING CONTROL LIMITS FOR X AND R CHARTS 

Sample X R 

1 112 7 

2 11°8 4 

3 108 8 

4 116 5 

5 110 Т 

6 9'6 4 

7 104 8 

8 9°6 4 

9 106 4 

10 100 9. 

п=10 EX=106'6 ER=63 


Control Limits for X Chart: 
Central line=X 
UCL=X+AoR 
LCL=X—A2R 
= 066 -= 637265; AS 
TO —10:66, К =-1б 76 3, 450577 
UCL= 10°66-+°577X63=14'2951 
LCL= 10:66— 577 x 6°3=7'0249 
Control limits for R Chart : 
Central line R 
UCL-RD, 
LCL-RD; 
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Hence central line=6°3 

UCL=6°3 x2'115=13°3245 

LCL=63x0=0, й, 
Since all the sample m»ans and rangss lie bstwzen the control limits, the pro- 

cess is in a state of control, 
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8 Business Forecasting 
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The growing competition, rapidity of change in circumstances and 
the trend towards automation demand that decisions in business are not 
based purely on guesses and haunches rather on a careful analysis of data 
concerning the future course of events. More time and attention is given 
to the future than to the past, and the question “what is likely to happen’’? 
takes precedence over "what has happened" ? though no attempt to 
answer the first can be taken without the facts and figures being available 
to answer the second. 

When estimates of future conditions are made on a systematic basis the 
process is referred to as “forecasting” and the figure or statement obtained 
is known asa "forecast". In a world where the future is not known 
with certainty, virtually every business and economic decision rests upon 
a forecast of future conditions. In fact when a man assumes the responsi- 
bility of running a business he automatically takes the responsibility for 
attempting to forecast the future and to a very large extent his success or 
failure would depend upon the ability to forecast successfully the future 
course of events. Forecasting aims at reducing the areas of uncertainty that 
surround management decision-making with respect to costs, profit, sales, 
production, pricing, capital investment, and so forth. Ifthe future were 
known with certainty, forecasting would be unnecessary. Decisions could 
be made and plans formulated on a once-and-for-all basis, without the 
need for subsequent revision. But uncertainty does exist, future outcomes 
are rarely assured and, therefore, organised system of forecasting is 
necessary rather than the establishment of predictions that are based on 
haunches, intuition or guesses. ^ 
Role of Forecasting їп Business 

It should be realised at the outset that the object of business 
forecasting is not to determine a curve or series of figures that will tell 
exactly what will happen, say, a year in advance, but it is to make analysis 
based on definite statistical data, which will enable an executive to take 
advantage of future conditions to a greater extent than he could do 
without them. In many respects the future tends to move like the past. 
This is à good thing, since without some element of continuity between 
past,present and future, there would be little possibility of successful 
prediction. But history is not likely to repeat itself and we would hardly 
expect economic conditions next year or over the next ten years to follow 
a clear-cut precedent. Yet, frequently past patterns prevail sufficiently to 
justify using the past as a basis for predicting the future. 

While forecasting one should note that it is impossible to forecast 
the future precisely—there always must be some range of error allowed 
for in the forecast. Statistical forecasts are those in which we can use 
the mathematical theory of probability to measure the risks of errors in 


predictions. 


Steps in Forecasting 
Forecasting business change involves more than analysis of statisti- 
cal data—it also embodies the prediction of economic change such as 


A-82 BUSINESS FORECASTING 


secular trend, seasonal variations, cyclical variations, and a consideration 
of cause and effect. 


Broadly speaking, the forecasting of business fluctuations consists 
of the following steps : 

1. Understanding why changes in the past have occurred. One of 
the basic principles of statistical forecasting—indeed of all forecasting 
when historical data are available—is that the forecaster should use the 
data on past performance to get a “speedometer reading” of the current 
rate (say, of sales) and of how fast this rate is increasing or decreasing. 
The current rate and changes in the rate—“acceleration” and "'decelera- 
tion"— constitute the basis of forecasting. Once they are known various 
mathematical techniques can develop projections from them. If an 
attempt is made to forecast business fluctuations without understanding 
why past changes have taken place, the forecast will be purely mechanical, 
based solely upon the application of mathematical formulae and sub- 
ject to serious error. 

2. Determining which phases of business activity must be measured. 
After it is known why business fluctuations have occurred, or if there is 
a reasonable supposition, it is necessary to measure certain phases of busi- 
ness activity in order to predict what changes will probably follow the 
present level of activity. 


1 3. Selecting and compiling data to be used as measuring devices. There 
is an interdependent relationship between the selection of statistical data 
and determination of why business fluctuations occur. Statistical data 
cannot be selected and compiled in an intelligent manner unless there 
is a sufficient understanding of business fluctuations ; likewise, it is im- 
portant that reasons for business fluctuations be stated in such а manner 
that it is possible to secure data that are related to the reasons. 


.4. Analysing the data. In this last step the data are analysed in 
the light of one's understanding of the reason why change occurs. For 
example, if it is reasoned that a certain combination of forces will result 
in a given change, the statistical part of the problem is to measure these 
forces, and from the data available, to draw conclusions on the future 
course of action. The methods of drawing conclusions may be called 
forecasting techniques, and they represent any one of a large number of 
analytical devices for summarising data and drawing inferences from the 
summaries. 

Methods of Forecasting 


, There is nothing new about business forecasting as for centuries the 
businessmen have tried to adjust themselves in such а manner as to make 
the best out of the future conditions. The rule-of-thumb method has 
been widely practised in business. It consists in deciding about the future 
in terms of past experience and familiarity with the problem at hand. 
Even today this method is very widely used in business. However, it can 
lead to absurd conclusions if employed by the inexperienced. 


Ш recent years the techniques of forecasting have improved to a 
marked degree and are applicable to almost every sphere of business acti- 
уйу. Attempts are being made to make forecasting as scientific as 
possible, The base of scientific forecasting is statistics, i.e., numerical 
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data on business trends which many businessmen fail to acquaint them- 
selves with. However, forecasting business change involves more than 
an analysis of statistical data—it also embodies the prediction of econo- 
mic change, such as secular trend, seasonal variation and a consideration 
of cause and effect. To handle the increasing variety of managerial 
forecasting problems many forecasting techniques have been developed 
in recent years. Each has its special use, and care must be taken to 
select the correct technique for a particular situation. Also before 
applying a method of forecasting the following questions should be 
answered : 

(1) What is the purpose of the forecast—how is it to be used ? 

(2) What are the dynamics and components of the system for 

which the forecast will be made ? 

(3) How important is the past in estimating the future ? 

The following are some of the important methods of forecasting : 

1. Business Barometérs 

2. Extrapolation 

3. Regression Analysis 

4. Econometric Models 

5. Forecasting by the use of Time Series Analysis 

6. Opinion Polling 

7. Causal Modes 

A forecast is usually a combination of several techniques. 


1. Business Barometers. Of great assistance in practical forecast- 
ing is a series that can be used as an “index” or *indicator" of the basic 
conditions related to the industry. The term “barometer” is also widely, 
though loosely, used in business statistics ; sometimes the term js used to 
mean simply an indicator of the present economic situations and some- 
times it is used to designate an indicator of future conditions. 

The following are some of the important series which aid business- 
men in forecasting : 

]. Gross national product 

2. Employment 

3. Wholesale prices 

4. Consumer prices 

5. Industrial production 

6. Volume of bank deposits and currency outstanding 

7. Consumer credit 
8. Disposable personal income 

9. Departmental store sales 

10.. Stock prices 
11. Bond yields. 

Thislist is by no means exhaustive; nor is the arrangement 
necessarily in order of importance. Several of the above series are com- 
posite averages or totals—or indexes of these averages ог total. Analysis 
also should be made of some of the major components of these series. 
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Index numbers relating to different activities in the field of pro- 
duction, trade, finance, etc., may also be combined into a general index 
of Business Activity. This general index refers to the general conditions 
oftrade and industry. But the behaviour of individual industries or 
trades might show a different trend from that of the Composite Business 
Activity Index. Also general boom or depression may be reflected in a 
majority of separate industries and trades, yet some industries and trades 
might show quite contrary tendencies. Hence, the study of general 
business conditions as revealed by the composite Business Index should 
be supplemented by special studies of individual businesses based on 
separate indices. The trends indicated by barometers will guide the 
businessman as to whether the stocks of goods should be increased or 
released or whether to increase investment or not, etc. 


2. Extrapolation. Extrapolation is the simplest yet often a 
useful method of forecasting. In many forecasting situations the most 
reasonable expectation is that the variable will follow its already estab- 
lished path. Extrapolation relies on the relative constancy in the pattern 
of past movements in some time series. Strictly speaking, nothing needs to 
be known about causation—why the series moves as it does. But in 
practice the justification does involve the nature of the growth process 
being described. Extrapolation is used frequently for sales forecasts and 
for other estimates when "better" forecasting methods may not be 
justified. 

4 Since extrapolation assumes that the variable will follow its estab- 

lished pattern of growth, the problem is to determine accurately the 
appropriate trend curve and the values of its parameters. Numerous 
alternative trend curves can be used for the purpose of business forecast- 
ing. Some of the most useful ones are : 


(a) Arithmetic trend. The straight-line arithmetic trend assumes 
that growth will be by a constant absolute amount each year. 

(b) Semi-log trend. The semi-logarithmic trend assumes a constant 
percentage increase each year. Since the annual increment is constant in 


logarithms, this line translates into a straight line when drawn on paper 
within a logarithmic vertical scale. 


; (c) Modified exponential trend. This curve assumes that each 
increment of growth will be a constant per cent less than 100 (100) of the 
previous one. The line tends generally to approach, but never quite 
reach, a constant asymptote, which may be thought of as an upper limit. 

(d) Logistic curve. The logistic curve has both an upper asymptote 
and a lower asymptote. It assumes a ‘law of growth’ involving increasing 
increments from an initial low value and then gradual slowing down of 
growth as ‘maturity’ is approached. 


(e) The Gompertz curve. The Gompertz curve is a curve with similar 


properties as described above and is often used to describe growth of 
industrial output. 


Selection of an appropriate growth curve can be guided by empirical 
and theoretical considerations. Empirically, it is a question of selecting 
the curve that best fits the past movement of the data. Theoretical 
matters which intervene in that logic may support a particular growth 
pattern. For example, population growth, when there are no 
restraints, implies a geometric pattern of growth, as has been known 


Y 
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since Malthus. With limited resources, however, population is sometimes 
thought to grow along a logistic curve. Lest these theoretical notions be 
taken too seriously, it should be emphasised that empirical considerations 
may lead us quickly to a more realistic and less restrictive notion of the 
relevant growth curve. 

3. Regression Analysis. The regression approach offers many 
valuable contributiens to the solution of the forecasting problem. It is the 
means by which we select from among the many possible or theoretically 
suggested relationships between variables in a complex economy those 
which will be useful for forecasting. With it, опе makes the jump from 
intuitive evaluation on the connection between two variables to precise 
quantified knowledge. If two variables are functionally related then a 
knowledge of one will make possible an estimate of the other. 
For example, if we know that advertising expenditure and sales are 
correlated than for a given advertising expenditure, we can find out the 
probable increase in sales or vice versa. 

Regression relationship may involve only one predicted or dependent 
and one independent variable—simple regression, or it may involve 
relationships between the variable to be forecast and several independent 
variables—multiple regression. Statistical techniques to estimate the 
regression equations are often fairly complex and time consuming but 
there are many computer programmes now available that estimate simple 
and multiple regressions quickly. 

There are two dangers in using regression analysis for forecasting : 

(1) There is possibility of a mechanistic approach accepting with 
little question the relationship which the calculations reveal— perhaps that 
with the highest r’—and applying it to forecast. There are many 
possibilities for spurious correlation among time series as many series 
move together over time even where there is no conceivable connection 
between them. 

(2) There is the risk that the estimated regression is false. The 
forecaster mustalways use his judgment and knowledge of the facts and 
of the underlying theory. 

4. Econometric Models. Econometric techniques, which originated 
in the eighteenth cenutry, have recently gained in popularity for forecasting. 
Much of the revival of econometrics is attributed to the growth of computer 
technology. Тһе term econometrics refers to the application of mathe- 
matical economic theory and statistical procedures to economic data in 
order to verify economic theorems and to establish quantitative results in 
economics. Ап econometrician is, therefore, an economist, a statistician 
and a mathematician, all in one. Econometric models take the form of a 
set of simultaneous equations. The values of the constants in such equa- 
tions are supplied by a study of statistical time series, and a large 
number of equations may be necessary to produce an adequate model. The 
work of computations is greatly facilitated by electronic data processing 
equipment like computer etc. 

At the present time, most short-term forecasting uses only statistical 
methods with little qualitative information. However, in the years to 
come when most large companies develop and refine econometric models 
oftheir major buinesses, this tool of forecasting will become more 
popular. However, it should be remembered that the development of 
an econometric model requires sufficient data so that the correct 
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relationships can be established. Hence, when data are scarce—for 
example when a product is first introduced into a market—this method 
cannot be profitably employed. 

The econometric model is in principle the mos! formal, since the 
forecast is based on an explicit mathematical model. The model states 
jn detail and in quantitative terms the way in which the various. aspects of 
the economy are interrelated. Theoretically, the model makes possible a 
wholly mechanical forecast, because once values have been estimated for 
the exogenous variable, the solution of the model gives specific values for 
the predicted variables. But in actual practice qualitative and quantitative 
forecasters have tended to come together. The ‘artist’ forecaster has 
become fully aware of the fact that he needs quantitative relationships, 
while the econometric forecaster has learnt that in some instances, 
quantitative relationships have to be modified by qualitative factors. 


The econometric model provides the forecaster with a record of the 
prediction with a clear statement of the assumptions concerning 
exogenous variables and the solution of the model—it is often possible or 
at least it is made easier to trace and reproduce the causes for success 
as well as failures. One can learn just where errors were made and, 
hopefully, where improvements can be made. Thus, discredited hypothesis 
may be dropped and new ones can be substituted which ultimately will 
lead to better understanding of the economic system and business 
fluctuations. 

The econometric models are not very popular in practice because 
it is probably neither necessary nor feasible for every business forecaster 
to construct his own model of the economy. The effort and cost involved 
in a fully developed econometric model аге well beyond most forecasting 
operations. Thus, most forecasters will probably rely for some time on 
the basic aggregate models developed at research institutes or at uni- 
versities. These models may be used to make predictions and to test out 
alternative assumptions about Government policy or the other exogenous 
aspects of the economy. With the help of the models and, hopefully, 
sector analysis of his own industry the business forecaster will be in a 
better position to augment other more familiar approaches. The greater 
the understanding of the various forecasting methods and of their inter- 
relationships, the better the forecasts will be. 

5. Forecasting by the use of Time Series Analysis. Time series 
analysis helps to identify and explain : 

(1) Any regular or systematic variation in the series of data which 
is due to seasonality—the **Seasonals". 

(2) Cyclical’ patterns. 

(3) Trends in the data. 

(4) Growth rates of these trends. Unfortunately, most existing 
methods identify only the seasonals, the combined effect of trends and 
cycles, and the irregular or chance component. That is, they do not 
separate trend from cycles. 

This is not to say that those other effects are not to some degree 
manageable. The suggestion is rather that the analysis for trend and 
seasonal effects and the projection of these two sets of forces should be 
understood to be the first step in the forecast and that, taking into account 
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such cyclical and residual forces as may be manageable, further 
refinements may be made. 

Many statisticians consider time series analysis a somewhat useless 
tool. One critic, М.Ј. Moroney, said that “Economic forecasting like 
weather forecasting in England is only valid for the next six hours or so. 
Beyond that it is sheer guesswork.” 

In any event, although the limitation of time series analysis must be 
understood, the importance and usefulness of these procedures should not 
be underestimated. The analysis serves two purposes : 

(1) It does provide an initial approximation forecast that takes into 
account those empirical regularities which may, with reasonable assurance, 
be expected to persist. 

(2) After the trend and seasonal effects have been identified and 
measured, the original data may be adjusted for these influences, yielding 
a new historical time series consisting of the trend and seasonally adjusted 
data. This new time series may be very helpful in the analysis and inter- 
pretation of cyclical and residual influences. 

It should be noted that this method of forceasting can be used only 
when several years’ data for a product or product line are available and 
when relationships and trends are both clear and relatively stable. 

6. Opinion Polling. Opinion polling is a basis for forecasting, 
The Survey Research Centre of the University of Michigan conducts an 
annual poll regarding the future plans of consumers. The answers to 
many questions are translated into short-run demand for colour televi- 
sion sets, automobiles and other consumer products, 

7. Causal Models. A causal model is the most sophisticated kind 
of forecasting tool. It expresses mathematically the relevant causal 
relationships, and may include pipeline considerations (i.e., inventories) 
and market survey information. It may also directly incorporate the 
results of a time series analysis. 

The causal model takes into account everything known of the 
dynamics of the flow system and utilises predictions of related events such 
as competitive action, strikes and promotions. If the data are available, 
the model generally includes factors for each location in the flow chart 
and connects these by questions to describe overall product flow. If cer- 
tain kinds of data are lacking, initially it may be necessary to make 
assumptions about some of the relationships and then track what is happen- 
ing to determine if the assumptions are true. Typically, a causal model 
is continually revised as more knowledge about the system becomes 
available. 


Choice of a Method of Forecasting 


е The selection of an appropriate method depends on many factors— 
the context of the forecast, the relevance and availability of historical data, 
the degree of accuracy desired, the time period for which forecasts are 
required, the cost benefit (or value) of the forecast to the company, and 
the time available for making the analysis. 

These factors must be weighed constantly, and on a variety of 
levels. In general, for example, the forecaster should choose a technique 
that makes the best use of available data. If he can readily apply one 
technique of acceptable accuracy, he should not try to “gold plate" by 
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using a more advanced technique that offers potentially greater accuracy 
but requires non-existent information or information that is costly to 
obtain. Furthermore, where a company wishes to forecast with reference 
to a particular product, it must consider the stage of the product's life 
cycle for which it is making the forecast. 


Theories of Business Forecasting 


Several theories have been developed out of researches conducted 
by individuals and institutions on business forecasting. Important amongst 
these аге: 

(1) Sequence or Time-lag Theory. 

(2) Action and Reaction Theory. 

(3) Economic Rhythm Theory. 

(4) Specific Historical Analogy. 

(5) Cross-section Analysis. 


1. Sequence or Time-lag Theory. This is by far the most impor- 
tant theory of business forecasting. It is based on the assumption that 
most of the business data have the lag and lead relationship, /.e., changes 
in business are successive and not simultaneous. There is time-lag be- 
tween different movements. For example, expenditure on advertisement 
may not at once lead to increase in sales. Similarly when government 
makes use of deficit financing it leads to inflationary pressures—the pur- 
chasing power of people goes up—the wholesale prices, the retail prices 
start rising. With the rise in retail prices the cost ofliving goes up and 
with it there is a demand for increased wages. Thus, one factor, Le.» 
more money in circulation, has affected various fields of economic acti- 
vity not simultaneously but successively. Similarly, when the excise 
duties are increased by the Government they result in increase in prices 
which would lead to higher demand for wages. 


The reliability of the forecast depends in this case upon the accu- 
racy with which time-lag is estimated. Also forecasting should not be 
done mechanically and due allowance should be given for the effects of 
the current economic conditions and other special factors operating at 
that time and the forecasts modified in the light of these special factors. 


2. Action and Reaction Theory. This theory is based on two as- 
sumptions : (1) every action has a reaction’ after some time, and (2) the 
magnitude of the original action. Thus if the price of wheat has gone up 
above a certain level in a certain period, there is likelihood that after some 
time it will go down below the normallevel. Thus, according to this 
theory a certain level of business activity is normal—sub-normal or abnor- 
mal conditions cannot remain so for ever—there is bound to be reaction. 
to them. Thus, we find four phases of a business cycle : 


]. Prosperity. 

2. Decline. 

3. Depression. 
4. Improvement. 


Since the theory regards a certain level of business activity as norma 
the normal level must be very carefully estimated at the time of makin; 
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forecasts. However, in practice, it is really difficult to decide precisely 
what constitutes ‘Normality’. 

3. Economic Rhythm Theory. The basic assumption of this theory 
is that ‘the history repeats itself’ and hence the exponents of this theory 
believe that economic phenomena behave in a rhythmic order. Cycles of 
nearly the same intensity and duration tend to recur. Thus, : the available 
historical data have to be analysed into their component parts and 
different types of fluctuations influencing them have to be segregated. A 
trend is then obtained which will represent a long-term tendency of growth 
or decline. This trend line is projected a number of years into the future 
either by the freehand method or by the mathematical method. This is 
done on the assumption that the trend line represents the normal growth 
or decline of the series. 


This theory has important limitations. First, business cycles are not 
strictly periodic and that the statistical extrapolation of cycles is not very 
satisfactory. Secondly, an error can be committed by an increase in 
either amplitude or duration, whereas the businessman is primarily 
interested in predicting the turning points of a cycle. 

4. Specific Historical Analogy. This theory is based on a more 
realistic assumption, i.e. that all business cycles are not uniform in 
amplitude or duration and as such the use of history is made not by 
projecting any fancied economic rhythm into the future, but by selecting 
some specific previous situation which has many ofthe earmarks of the 
present and concluding that what happened in that previous situation will 
happen in the present one also. 

What is done is that a time series relating to the data in question is 
thoroughly scrutinised and from it such period is selected in which 
conditions were similar to those prevailing at the time of making the 
forecasts. The course which events took in the past under similar circum- 
stances is then studied which gives an idea of the likely course which the 
phenomenon in question would follow. For example, after World War 
П many persons forecast a depression because World War I had been 
followed by а depression. 


5. Cross-section Analysis. This theory is based on the knowledge 
and interpretation of current forces rather than projection of past trends. 
The theory assumes that no two cycles are alike, but the like causes 
always produce like results. АП the factors bearing upon a given situation 
are assembled and relying upon the knowledge of economic processes, 
the forecaster concludes whether the situation is favourable or not. 
Immediate recognition is given to the fact that business conditions are 
shaped by simultaneous inflationary and deflationary forces. Predomi- 
nance of inflationary forces results in booms, whereas predominance of 
deflationary forces leads to depression. The forecaster who utilizes this 
method prepares three lists: one which itemizes inflationary forces, a 
second which enumerates stable forces and a third which sets forth 
deflationary forces on the basis of judgment. 


Obviously, the dominant forces change from time to time. Factors 
which need careful attention include technological development, supply- 
demand relationship, governmental policies and businessmen's expectation. 
In regard to the latter, several organisations regularly conduct surveys of 
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executive opinions concerning future trends of general business conditions 
and selected series of business data. 


Forecasting Agencies 

Business forecasting has become а specialized job and in many 
advanced countries like the U.S.A. and the U.K., there are forecasting 
agencies which employ expert statisticians to analyse and interpret 
statistical material and publish results. However, in India such agencies 
though badly needed have not yetcome up. The important forecasting 
agencies of the U.S.A. and the U.K. are : 

U.S.A (1) Harvard Committee of Economic Research. 
не (2) Brookmire Economic Service. 

UK ( (1) London and Cambridge Economic Service. 
"e (2) Economists’ Organisation. 

In our country most of the businessmen even today depend upon 
their intuition and judgment rather than on scientific analysis of facts for 
deciding future course of action and thus scientific forecasting is practically 
absent. They believe more in God, chance and stars. However, it is 
gratifying to note that a large number of governmental and non-govern- 
mental agencies are engaged in the task of collection, analysis and 
interpretation of data affecting the various aspects of business and 
economy. If the businessman supplements his judgment with these 
important indicators a much better forecast would be possible. 


Caution while using Forecasting Techniques 

Forecasting business conditions is a complex task which cannot be 
accomplished with exactness. The economic, social and political forces 
which shape the future are many and varied; their relative importance 
changes almost constantly. It is obvious, therefore, that statistical 
methods cannot claim to be able to make the uncertain future certain. It 
does not follow from this disclaimer that statistical methods have nothing 
to contribute to business forecasting. The choice is not between forecasting 
and not forecasting, because the lack of a forecast implies a dangerous 
type of forecast, the mere warning of a possibility of a change is better 
than no warning at all as is wisely said “‘Forewarned is forearmed". Also 
it should be remembered that forecasts are not made just for Ше. 
sake of forecasting, thatis, they are not ends in themselves. Forecasts 
are made in order to assist management determine a strategy and alter- 
native strategies. 

As a final word of caution it may be emphasised that no matter what 
methods of forecasting are used it is essential that the forecasts be check- 
ed by the judgment of individuals who are familiar with the business. 
While it is true that the use of statistical data is ап attempt to substitute 
facts for subjective judgment it does not mean that knowledge gained 
through experience in a given situation should be ignored in favour of 
quantitative data. It is particularly important to take into consideration 
any specific plans of the business that might affect the pattern of sales in 
relation to мрсна used for forecasting. Моге successful forecasting 
will result by, combining with statistical forecasting the judgment and 
knowledge of current business trends. 

Also it is important to emphasise that any forecast should be 
reviewed frequently and revised in the light of the most recent information. 
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Forecasting is not a one-shot operation. То be effective it requires conti- 
nuous attention. Unanticipated developments will often change our 
picture of the future, or at least clarify it. In terms of any original deci- 
sions and actions that have been taken, this rule implies continuous 
modification wherever necessary. The technique of flexible budgets has * 
been developed to permit the revision of the budget estimates, and every- 
one dealing with forecasts should be alert to the need for constantly check- 
ing to see if anything has happened to change the outlook. Keeping 
accurately informed «bout the current level of business is probably the 
simplest insurance that can be secured against making wrong decisions 
regarding the future. 
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Partial and Multiple 


Correlation 


The correlation and regression coefficients discussed earlier 
measure the degree and nature of the effect of one variable on another. 
While it is useful to know how one phenomenon is influenced by another, 
it is also important to know how one phenomenon is effected by several 
other variables. In nature, relationship tends to be complex rather than 
simple. One variable is related to а great number of others, many of 
which may be interrelated among themselves. For example, yield of 
rice is affected by the type of soil, temperature, amount of rainfall, etc. 
Whether phenomena be biological, physical, chemical or economic, they 
are affected by a multiplicity of causal factors. Itis part of the statis- 
tician’s task to determine the effect of one cause, of two or more causes 
acting separately or simultaneously, or one cause when the effect of others 
is estimated. This is done with the help of multiple and partial corre- 
lation analysis. 


PARTIAL CORRELATION 


It is often important to measure the correlation between a depen- 
dent variable and one particular independent variable when all other 
variables involved are kept constant, ie. when the effects of all other 
variables are removed (often indicated by the phrase “other things being 
equal"). This can be obtained by calculating coefficient of partial corre- 
lation. For example, if we have three variables yield of wheat, amount 
of rainfall and temperature and if we limit our analysis of yield and 
rainfall to periods when a certain average daily temperature existed, or 
if we treat the problem mathematically in such a way that changes in tem- 
perature are allowed for, the problem becomes one of partial correlation. 
Thus partial correlation analysis measures the strength of the relation- 
ship between Y and one independent variable in such a way that variations 
in the other independent variables are taken into account. А partial 
correlation coefficient is analogous to a partial regression coefficient in 
that all other factors are “held constant". Simple correlation, on the 
du tend. ignores the effect ofall other variables even though these 
азе Mod be quite closely related. to the dependent variable, or 
Partial Correlation Coefficient 


Partial correlation coefficient provides a measure of the relationship 


between the dependent variable and other variables, with the effect of the 
rest of the variables eliminated. 


If we denote by гуз. the coefficient of partial correlation between X; 
and X, keeping X, constant, we find that N 
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Tja Tia —l'iglos 
v 1—3 1— rs 
Similarly, — 
Tja Tig — i23 


у= 
where а.а is the coefficient of partial correlation between X; and X, keeping 
X, constant. 
Bing e unie 
MT — P 1—rs 
where 755.1 is the coefficient of partial correlation between Хз and X, 
keeping X; constant. 

Thus for three variables X,, X, and X; there will be three co-efficients 
of partial correlation each studying the relationship between two variables 
when the third is held constant. 


Zero Order, First Order and Second Order Coefficients 


Partial coefficients such as 753,7,» are often referred to as first 
order coefficients, since one variable has been held constant. Simple 
coefficients (correlation between two variables only) are called zero order 
coefficients, since no variables are held constant. 72-34) i26 etc., are 
called second order coefficients since two variables are kept constant. 
Stated generally, the order designation indicates the number of variables 
that have been held constant statistically. ‚ 

Any first order coefficients сап be determined from the values of 
three zero order coefficients. Similarly second order coefficients can be 
obtained from first order coefficients. 

Illustration 1. In a trivariate distribution it is found that 
713—077, r15 7061, r257704 

Find the value of гәз. and 713.2. : 

Solution. T2317 Ae 


Ji- Y 1-0 


Геза 


(B.Sc., Kanpur, 1972) 


Substituting the given values 
Pa 94 —07x0'61 
vi- (07)? /1— (061)* 
2074—0427 
/0513/1—03721 
—0:027 " 
=50714х0792 0058 


ae ria ГіЗГаз.. 
тїзєз= ——— soe 


ah 1= > J 1-ға 
LL 061—(97) (04) 

М 1-(07)? у 1—(04)* 
P 061—028 
41-049 V 1-016 
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033 ^ 
Убу 70594 


i T12— "1323 
712:37—3 i 


2 y " 

/ 1—1 M 17r 
_ __07—(061х04) 

М1—(0°61)#у/1— (0:4)? 

07-0244 
m———————— 6 

y 1—0'37214/1—0'16 ae 

Illustration 2, On the basis of observations made on 30 cotton plants, the total , 


correlation of yield of cotton (X1), number of bolls, ie. seed vessels (Хг) and height 
(X3) are found to be : 


713—0'8, r13—0'65 and re3=0°7 
Compute the partial correlation between yield of cotton and the number of bolls, 
eliminating the effect of height. (M. Com., Delhi, 1970) 


Solution, We have to find the partial correlation between yield of cotton and 
the number of bolls, eliminating the effect of height, i.e. in terms of symbols we have to 
calculate г.з 


LL D 71а Роз 
Гіз M ===—= 


2 2 
м 1—73 vi l-rà 


712—0'8, r33—0'65 and r45—077 
Substituting the values 
0:8— (065 x 077) 
У 1—(065)24/ 1— (077) 2 
пе 08—0'455 
_ VI-04225 у 1—049. 
0:345 0:345 
7 o76xo7i4 07543 70635. 
Illustration 3. The following zero-order correlation coefficients are given: 
т12==0`98, r13—0'44 and ro3=0°54 


Calculate the partial correlation coefficient between first and third variables keep- 
Ing the effect of second variable constant. (M. Com., Lelhi, 1975) 


Solution, We are required to find the value of r13.2 


T12.3— 


Tis—Vi2 Fes 
far ers чы 
Y 1—rie af 1—22 
713—0'98, ri2=0°44, roz=0°54 
Substituting the values 


rs = 044— (098x054) 

У1- (098)*4/1— (0'54)? 

a 0°44 —0°5292 
Vv 1—0:96044/ 1—02916 
070892 " 

=~ 9x09 = 916767 - 0592 
Illustration 4, Ts it possible to get the following from a set of experimental data : 
(a) rs3—0'8, r31=—0°5, rig— 076 
(Б) r23—077, r33——0'4, r12=0°6 


a as 
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Solution, (а) rimam 12722728 
"Exe bear) 
y lrs y 1—3 
0:6 (—0:5)00:8) 
V1—(—05) 1 — (0'8)? 
ааа ГУ 
4075 /036 052 
Since the value of r123 is greater than one there is some inconsistency in the 
given data. 
(b) rus ED 
/ 2 | H 
у Iri ат кы 
A 0:6—(—0'4) (0*7) 
М1- (04) V1-(07)? 
064-028 0'88 
Кешыр ыгынан Ае КУР 
Jos 051—065 1° 
This again is greater than one which is not possible. 
Partial Correlation Coefficients in case of Four Variables 
When four variables are involved in a correlation problem, there are 
twelve possible first-order coefficients. Some of these are : 


NAS 5 
Ving Мру 
r Tra Глаз Faa 
14:3 7- Lg a 
Vi- vi = 
ар Туз Га "за 
13-4 == 
Vis Visha 
та Гуз 714 Toà 
Vich Vi-n, 
Tog Ton Toa — 


Toj:3— —— 
У1- 3s M —Tha 


Ф Тад — Гоз Гол 
4-2 
Vid, У 
AUS Раз —T24 Їза 
29:4 - 
Мі, У1—% 


Similarly the formulae for other partial 'correlation coefficients, i.e. 


Fig» Газ-2» Toots l'as Toa CAD also be written. 
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Second-order Partial Correlation Coefficients 

Second order coefficients may be obtained from first order coefficients. 
In case of four variables, if rj, is the coefficient of partial correlation 
between X, and Y, keeping Y, and X, constant, then 


Гуза — з-д Рәз-4 


ea 
Vi ma ma 
Similarly, 
Tras —T14a-3 lega 
and Teg = 88 7 19 249 0 0 


М-ы Vi-n 


Alternative formulae giving the same results are available for all three of 
the second-order coefficients. They are : 


Ties Гаев Родез 


Ti2:34— 


У] з vi —Гиз 


Tisa Tige Рада _ 
vı "з vı "4.2 


зз = 


— Taco Газа P422 
Tua — Mui et __ 


Миы. М1. 


The value of a partial correlation coefficient is usually interpreted via 
the corresponding coefficient of partial determination, which is merely 
the square of the former. Thus if Ty2-3=0'4 r7y5.,=0°16. 

The t-test employed to test the significance of a simple correlation 
can be employed to test the significance of a partial correlation when 
the number of degrees of freedom is reduced by the number of variables 
eliminated. 

Characteristics and Uses of Partial Correlation Analysis. The func- 
Чоп of partial correlation analysis is the measurement of relationship be- 
tween two factors, with the effects of one or more other factors eliminated. 
If the assumptions of the method are true for a series of data, the power 
of partial analysis is great. The problem of holding certain variables 
constant while the relationship between the others is measured often pre- 
sents itself in statistical analysis. Partial correlation is especially useful 
in the analysis of interrelated series. It is particularly pertinent to uncen- 
trolled experiments of various kinds, in which such interrelationship usu- 
ally exists. Most economic data fall in this category. 

Partial correlation is of greatest value when used in conjunction 
with gross and multiple correlation in the analysis of factors affecting 
variations in many kinds of phenomena. 

Partial analysis, like all correlation, has the advantage that the 
relationships are expressed concisely in a few well-defined coefficients, 
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Also it is adaptable to small amounts of data and the reliability of the 
results can be rather easily tested. 

Limitations of Partial Correlation Analysis. 1. The usefulness of 
the partial analysis is somewhat limited by the following basic assumptions 
ef the method : 

(i) The gross or zero-order correlation must have linear regressions. 

(ii) The effects of the independent variables must be additively and 

not jointly related. 

(iii) Because the reliability of partial coefficients decreases as its order 
increases, the number of observations in gross correlations should 
be fairly large. Often the student carries the analysis beyond 
the limits of the data. Thus weakness to some extent can be 
guarded against by test of reliability. 

2. When the above assumptions have been satisfied, partial analysis 
still possesses the disadvantages of laborious calculations апа difficult 
interpretation even for statisticians. 

The interpretation of the partial and multiple correlation results 
tends to assume that the independent variables have causal effects om 
dependent variable. This assumption is sometimes true, but more often 
untrue in varying degrees. 

The Significance of a Partial Correlation Coefficient 
The significance of a partial r may be determined most readily, by 


way of the Z transformation. The SE. AS and the S.E. of the Z 


corresponding to 715. 15 T 


One degree of freedom is subtracted 


‘from N for each variable eliminated, in addition to the 3 already lost. So 
1 


{ог ryssa the S.E. of the correspoding Zis VIP NES: 


Illustration 5. Suppose that 712.54—0'5 and N=41. 15 the partial r significant ? 
Solution. 
1 1 а 

VN=5 у41-5 dd 

The Z corresponding to r of 0°50 is 055*. 

The 95% confidence interval for the population Z is 

0:55-I-1:96x 0:167 or from 07223 to 0877 

The 712.31 is significant in the sense that the population г. is not likely to be zero 
(the lower limit of the confidence range is 0:18). But the coefficient must be judged to be 
snot very stable. 


S.E.= 


MULTIPLE CORRELATION 


In problems of multiple correlation we are dealing with situations 
that involve three or more variables. For example, we may consider the 
association between the yield of wheat per acre and both the amount of 
rainfall and the average daily temperature. Weare trying to make estimates 
cof the value of one of these variables based on the values of all the others. 


* Please sce table at the end of the book. 
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Thevariable whose value we are trying to estimate is called the dependent 
variable and the other variables on which our estimates are based are. 
known as independent variables. The statistician himself chooses which 
variable is to be dependent and which variables are to be independent. 
It is merely a question of problem being studied. If we are trying to 
determine the most probable weight of men, we make weight the de- 
pendent variable and height, age, etc., independent variables. If on the 
other hand, we are interested in estimating height, we will make height 
the dependent variable and weight, age, etc., the independent variables. 
Thus in problems of multiple correlation we always have three or more 
variables (one dependent and the rest independent). In order that we 
may distinguish them easily we follow the custom of representing them 
by the letter X with subscript. The dependent variable is always denoted 
by X; and the others by X,, X,, etc. Thus in the height, age and weight 
problem, if we are trying to estimate men's weight (that is, if weight be 
dependent variable), we might denote 


X,—weight in 1b. 
X,->height in inches 
X,--age in years. 
Coefficient of Multiple Correlation 
The coefficient of multiple linear correlation* is represented by Ry 
and it is common to add subscripts designating the variables involved. 
Thus R,,, would represent the coefficient of multiple linear correlation 


between X; on the one hand, and X;, Y, and Y, on the other. The sub- 
script of the dependent variable is always to the left of the point. 


The coefficient of multiple correlation can be expressed in terms of 
735; 713 and 75; as follows : 


3 3-7 7 
а-о riara os 


asil 
Күз= у Пета 


| ria H73 271271373 


К.13= лр Fie saa 


V 1-7, 
/ Tis - — 27137137 з 
Ryag— Í 
V 1—6 


A coefficient multiple correlation such as Ry.ss lies between 0 and 1. 
The closer it is to 1 the better is the linear relationship between the vari- 
ables. The closer it is to 0 the worse is the linear relationship. If the 
coefficient of multiple correlation is 1, the correlation is called perfect. 
Although a correlation coefficient of 0 indicates no linear relationship 
between the variables, it is possible that a non-linear relationship may exist. 
It should be noted that whereas the correlation coefficients range from 


.. , When a linear regression equation is used, the coefficient of multiple corre- 
lation is called the coefficient of linear multiple correlation. Unless otherwise specified, 
whenever we refer to multiple correlation we shall imply linear multiple correlation. 


PARTIAL AND MULTIPLE CORRELATION A-9'8 


+10 to 0 to —1°0, the coefficients of multiple correlation are always 
positive in sign and range from +1°0 to 0. 
By squaring Ri; We obtain the coefficient of multiple determination. 


An alternate formula for obtaining the value of Ry.23 is as follows i 


j 
Кузз= мМ rs. (1 г) 


ог 
R? ogri Tr 3s (1 —h) 
Similarly 
| rja-- ria 2735734754 
Ren y {м 
1—ris 
sig sel УРОД ЖЕ ӨН aha и 
о: 1-24 4/ E ris.2(1—"12) 
панова 7 ee инеш: 
and Ru Tib der 2h al tala 
1—4 
ог К.з, i ris rias (1—18) 


To determine a multiple coefficient with three independent variables the 
following formula shall be used : 


prec dita 
К;.5з4= у 1—(—ri) (1 —risa) (1 = ам) 
Illustration 6. The following zero-order correlation coefficients are given 
113—028, r13= 0°44 and ro3 —0'54. 


Calculate multiple correlation coefficient treating first variable as dependent and second 
and third variables as independent. (M. Com., Delhi, 1971) 


Solution, We have to calculate the multiple correlation coefficient treating 
first variable as dependent and second and third variables as independent, i.e. we have 


to find Ry23- 
е; 
i uno 1a—2r12 715 723 


1—7 


Substituting the given values 


(98)52- C44)? —2(98) (44) 054) 
Rogo 1- (54? 


Advantages of Multiple Correlation Analysis. The coefficient of 
multiple correlation serves the following purposes : 
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1. It serves as a measure of the degree of association between one 
variable taken as the dependent variable and a group of other variables 
taken as the independent variables. 

2. Hence it also serves as a measure of goodness of fit of the cal- 
culated plane of regression and consequently as a measure of the general 
degree of accuracy of estimates made by reference to equation for the 
plane of regression. 

Limitations of Multiple Correlation Analysis. 1. Multiple correlation 
analysis is based on the assumption that the relationship between the 
variables is linear. In other words, the rate of change in one variable in 
terms of another is assumed to be constant for all values. In practice 
most relationships are not linear but follow some other pattern. This limits 
somewhat the use of multiple correlation analysis. The linear regression 
coefficients are not accurately descriptive of curvilinear data. 


- 2, A second important limitation is the assumption that effects of 
independent variables on the dependent variables are separate, distinct and 
additive. When the effects of variables are additive, a given change in 
one has the same effect on the dependent variable. regardless of the sizes 
of the other two independent variables, 


3. Linear multiple correlation involves a great deal of work relative 
to the results frequently obtained. When the results are obtained, only a 
few students well trained in the method are able to interpret them. The 
misuse of correlation results has probably cast more doubt on the method 
than is justified. However, this lack of understanding and resulting 
misuse are due to the complexity of the method. 


Multiple Regression Equation 


A regression equation is an equation for estimating a dependent 
variable, say, X, from the independent variables X» Хз... and is called a 
regression equation of X, on Xs Ху... In functional notation this is 
‘sometimes written briefly as XQ,—F(X,, Ху...) read “X, is a function of X;, 
X, and so on". 


For the case of three variables, the Simplest regression equation of 
X, on X, and X; has the form 


X,—41 29+ big X, bg aX; АА, 


Due to the fact that X; varies partially because of variation in X; 
and partially because of variation. in Хз we call 5j. and уз. the partial 
regression coefficients of X, on X, keeping X, constant and of Y, on X; 
keeping X, constant respectively. 


Normal Equations for the Least Square Regression Plane 


Just as there exist least square regression lines approximating a set 
of N data points (Y, Y) in a two dimensional scatter diagram so also 
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there exist Teast square regression planes fitting a set of N data points (Xi, 
X,, Ху) in a three-dimensional scatter diagram. 

The least square regression plane of X; on X, and X, has the 
equation (1), where 5,,.; and 5,5. are determined by solving simultaneously 
the normal equations 

if XX,—Na, 45 - Dis aEXo + Diss Xo 
УХХ аа ХХ: 
УХ, X= ti. 2 Xs оза Dy 2X 

These can be obtained formally. by multiplying both sides of 
equation (i) by 1, X, and X; successively and summing on both sides. 

When the number of variables is 4 or more solving the above system 
of normal equation becomes а very tedious procedure. Efficient methods 
solving simultaneous equations require a knowledge of matrix algebra, 
which is not assumed for the reader of this text. Thus in. our discussion 
that follows, we shall confine ourselves to the two independent variable 
cases which, of course, can be extended to cover cases with three or more 
independent variables. 

Deviations taken from Actual Means. Sometimes the work involved’ 
in finding these regression equations is reduced by proceeding in. terms of 
deviations from the means ofthe variables under consideration. The 
regression equation for three variables in this procedure is : 

ху Dig. aXXo Pig.a%s y ^ 
where ху=(Х,— X), X4— (X5 — X) ; x3 (X3 — X3) 


The value of Р. and bisa can be obtained by solving simul- 
taneously the following two normal equations : 
Exyxy— зх зза 
Ex X= заха Ризаа i 
The value of Р. and Б.» can also be obtained as follows : 
01. 
Буз.з=Гзз-з X E 
62-13 
913-9 
Up E 
19:2 13:2 баай 
The regression equation of X; on X, and Хз сап be expressed as 


follows : 
ose (poems Y sy ) 
1—г% Г 


| ( тз——ТэГзз s ys. z.) 
79. s 


2 
T23 


The regression equation of X, on X, and X, can be written as follows : 


A s S; = 
X, z-(» Tigria Y s (6-2 ) 


1—7 


Ce aes) 


1—ris 
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This method of obtaining regression equations is much simpler compared 
to one where simultaneously several normal equations are to be solved, 
For calculating regression equation for three variables when the above 
procedure is used we need the following : 

Жүр ae ee 

51. $ Sy 

Fu T Tag 
Other Equations of Multiple Linear Regression 


In the case of two variables there were two equations of regression ; 
one of them indicating regression of Y on Y, and the other, that of Y on 
Y. Whenthere are three variables, there will be three equations of 
regression, one indicating the regression of X on X, and Xj, the other 
indicating the regression of X, on X, and X, and the third indicating the 
regression of Хз on X, an Х,. The first of these has been given earlier. If 
X, and X, were to be treated as dependent variables the regression 
equation will respectively be : 

Халз з: -- (ii) 
X244) 705.5 X, +b 21X2 ii) 
The normal equations for fitting (i?) will be : 
УХ, — Na, 44-0214 X, D, EX, 
EXX—a, isEX bas EX?-Eb,4X X, 
УХХ, 74,432 X34- by XXX, bas, 2X," 
In case we want to fit equation (iii) the normal equations will be : 
=X, —Nas y by aZX, +b DX 
EX X—a)33X,- Eb, EX? + by XX Y, 
XX, Xs—a)4:EX, by SEX, X, Hby XX, 
Generalizations for More Than Three Variables 


In case of four variables the linear regression equation of Y, on X, 
X, and X, can be written as 


Хуа ааа, 

It represents а hyperplane in four-dimensional space. On formal 
multiplication of both sides of the above equation by Ху, Xs, X, and Y, 
successively and then Summing on both sides we obtain the normal 
equations for determination of 11-234; bisioa бз. and bis; Which when 


substituted to above equation gives the least Square regression equation 
of X, on Xə, X; and Х,. 


.The availability of these programmes enables many an analyst to 
obtain the desired Tegression and correlation result without the analyst 
having to spend time writing a computer Programme. The suitability of a 
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given libarary programme for use in a particular problem depends upon 
the input requirements, operating procedure and results computed by the 
programme. Many library programmes are sufficiently general and 
comprehensive to fulfil the requirements of a wide variety of users. 

Illustration 7. Find the multiple linear regression equation of X; on Хз and X 
from the data relating to three variables given below : 


x 4 6 7 9 13 15 
d 15 12 8 6 4 3 
XG 30 24 20 14 10 4 


(B.A., Bombay, 1973) 
Solution, The regression equation of Х1 on X» and X3 is 
Xı=4;.23+b12.3 X27 013.2 Хз. 
The value of the constants 41,23, 412.3 and bis. are obtained by solving the 
following three normal equations : 
EX, =Nay.93-+b)9.22X +b13.22%X3 
2X, Xo=ar.99BXot+by2.92%2*+b13.22%e%a 
EX, X3=a1.232X3+b12.32X2X3+b13.28X3”. 
Calculating three required values : 


X Xs Xs XXa | XiXs | XX; x xs; x? 
4 15 30 60 120 450 225 900 16 
6 12 24 72 144 288 144 576 36 
1 8 20 56 140 160 64 400 49 
9 6 14 54 126 84 36 196 81 
13 4 10 52 130 40 16 100 | 169 
15 3 4 45 60 12 9 16 225 
| — 
X 2X: хх, | XXX EX) Xx, | ZX? | УХ. | IXY 
zn 2% | Aida | 339 | шй =1034 | =49%4 |=2,188 | =5% 
Substituting the values in the normal equations : 
ба1.224-48012:3102012,2=54 И, 
4823.23 494512.3-1034513.3 339 (й) 
102a1.23+ 1034512.3-- 2188b13.9 720 (й) 
Multiplying Ean. (0) by 8, we get . 
4841.23--384b12.31-816513.3—432 (й) 
Subtracting Eqn. (ii) from (iv), we get 
110b12.3+21813.2=—93 EO) 
Multiplying Eqn. (i) by 17, we get { 
10221.23--816012.3-- 1734515. —918 (vi) 
Subtracting Eqn. (iii) from Eqn. (vi), we get 
218b19.3-+454613-2=— 198 „e (vii) 
Multiplying Eqn. (v) by; 109, we obtain D 
я 11990by2.9-+23762b13.2=—10137 . (viii) 
Multiplying Бап. (vii) by 55, we get ; 
11990 bis .3-24970515.9— —10890 x) 


Subtracting Eqn. (уй) from Eqn. (ix), we get 
1208b13.2=—753 
—153 


or Баз ge 500 
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Substituting this value of 513.» in Eqn. (v), we get 
1105;2.54-218(—0'623) — —93 
1105,5.5—135:814—93 


Substituting the values of bi2.3 and 5,5.» in Eqn. (i), we get 
641.23+-48(0°389) + 102(—0°623) =54 
ба1.23=544-63'546— 18672 
41.35—16:479 
Thus the required regression equation is : 
X1=16°479+0°389 X, —07623 Xz. 
Шиѕќгаќіоп 8. Given the following, determine the regression equation оѓ: 
(i) x1 on x» and xs 
(ii) хә on x1 and xs 
r19—0'8 r13=0°6 To —0*5 
в1=10 вз=8 0375. 
Solution. (i) Regression equation of X on X» and Хз is given by 
X1— abis. Bo+b13.2 Хз 


If the variates X1, Xo and Хз are measured as deviations from their respective 

means, ‘a’ will be zero. The values of 512.3 and bis. scan be calculated from the data 

iven above but not for ‘a’. So letus assume xi, and хэ and хз represent deviations 
rom means. So the regression equation of x; on x» and хз is 


X7 D12.3Xo-D15.9x3 


9i F19— rial 23 
bina xs 


1—љз* 
= 8 09 09 ong 
bia - x EE 
—10 ,06-(03) (05) _ 5... 


5 1-5)? 
Required regression equation is 
21—0:833 x9+0°533 Xs 
(ii) Regression equation of X» on x; and x3 
X2=bo1.3 x1--Do3.1x3 


ri?—resris 


СА 
з= oh rl 
1 1—73 


= 8 ,08—(05) (0:6) 
i0* i-Qo - 
_ 8 08—03 
10 Х 1-036 
ёз 0% 


=10 v6 70625 
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гоз „Да 
_3_, 0:5—(08) (06) 
5X 1-09 
8 


Thus x, —0'625 х,+0'05 xs. 


Reliability of Estimates 

The problem of determining the accuracy of estimates from the 
multiple regression is basically the same as for estimates from a simple 
regression equation. Since the correlation is seldom perfect, estimates 
made from the regression equation will deviate from the correct value of 
the dependent variable. If an estimate is to be of maximum usefulness, it 
is necessary to have some indication of its precision. Just as with the 
simple reg ression equation, the measure of reliability is an average of the 
deviation of the actual value of non-dependent variable from the estimates 
from the regression equation or, in other words, the standard error of 


estimate. 
The standard error of estimate of X, on X, and X, is defined as 


EI 2 
Sinn у Сы et 
Sizs represents standard error of estimate of X, on Жу and Xs Yo 
indicates the estimated value of X, as calculated from the regression 
equations, 2 
In terms of the correlation coefficients rj; 713 and rss, the standard 
error of estimate can also be computed from the result : 


2 2 
1—0, — Tis — ris t 27127137за 


Sisy—51 | 
| 

v 
MISCELLANEOUS ILLUSTRATIONS 


illustration 9, In a trivariate distribution 
с1=2, 02=03=3 
12077, rog r31—05 


Find (i) biz: and (i) Diss 


1—ri, 


Solution : 
" c 913 
ti) biza =r а 
терта «97-0505 
d TED —(Q 3 /1-—(Q' 
Ji fb v1—(05)/1—(05) 


1071-025 045 Log 
vors үб'75 075 
01.3—91 MO 22y/1—025—1732 


SM-A—1177-63 
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Өе Ја) 025—2:598 


91.2 
(i) Бузауга i 
Ti2—r19 rea 0:5—(07x075) 
Р1з.2= 


У-у — у1—(07)ү1—(0:3) 


Wu ЖЫЛЫШ УУ, 
VOSI V075 
eism Jm 2047122049 ==1-428 


933-93 Jr) ez soe 


biam 3X 328. m 184. 
Illustration 10, In a trivariate distribution, 
01—3, 03=03==5 
713706, ғаз = 751—078 
Find (i) лез. and (ii) Risa. 


Solution, 
i) тзл= TM. ~ 
A lI—rjs y l-r 
0'8—0'6x08 LA .08—0'48 0'667 
vi-(0G6?.1—(v8) v 0'64 4/036 
n F 2, x Age 
(i) Rise 121-713 — 271971373 


1-%, 


= (06)*- (0:8)3—2(0'6] (078) (0.8) 
У 1-(08) 


l0-36-LO'6A—0"768 27204 
0168 Gia p goa, 
Illustration 11. The correlation between a general intelligence test and schoo 
achievement in a group of children from 6 to 15 years old is 0:80. The correlation be- 
tween the general intelligence test and age in the same group is 0°70 and the correlation 


between school achievement and age is 0°60. What is the correlation between general 
intelligence and 


| school achievement in children of the same асе? Comment upon your 
result, 


Solution, Let X, denote general intelligence test 
X» denote school achievement - 
Хз denote age. 
We are given : 
7,2—0'8, ri3=0°7 and re3=0'6 
We are to find 712.3 
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F 3—r13 Г: 
Аах 7.221888 


h-me hoa 
: uv it 


L..08—07x0'6 
V 1—(07)? 4/1—(0'6)* 
"08-042... 0380666 
VOsIyV064 0571 
Yüustration 12, Given r12—0728, r23=0°49, rs; —051 
201—277, с2=2'4,03=2'7 


Find th ssi ti f x3 on x; and xs. 
е regression equation of x3 on x1 ° ad 


:Solution, The regression equation of xs on x; and хз is: 
X37—531.2 X1 F b33.1 X2 
Баатара 2012, 
01.23 
(8212. 
93.31 
"Hence we should find the values of 
731.2, 732.1, 03.19, 01.23, 02.31 


bas. 1—r524 X 


0'51= (0°49 x 028) 
Ta 1í—(049)? /1—(028)* 

US ЕЁ СЕК Лр» 
0:51—0:1372 328. 

= == AS 
\/0`9216х0`7599 0"8368 

049—(0'51X028) 

TT TL АЗ түшө) /1—(0°51) * 

y! hav! Түз» а M 

—.0:49-:01428: 0347. 


Fai—r23 ri2 


731.3— 


r32 — 19713 


"зз.1= 


93.1203 ха 1-75. / Irby 
fe xg, TOSI y 0907 


—2'7x0:86x091—21 


ЕСИ, 1—2 P, ria 


221 qm 8)* ZI Musas 


| 


9131792 y 1—rh у. 1-r* 
“Since we do not know the value of r12.3 we should first calculate this, 
ni 1—гзл/ 1—r3, 
0 28—(0:51x 049) 028—025 5.04 
Ji—(051) qoe) 1^ 0:86x0'87 


Гөз”? 
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Teac! Loo 
ET E, Ja (0704) 
—24x0:81x0:99—2707 
MR TCR 
baia 70445X 5.55 —0404 
2108 


bz2.1=0'42X 7909-0432. 


Substituting the values of 531.» and big.1 in the equation 

x3=0'404 x1+0°424 ху. 

Illustration 13, The correlation between a general intelligence test and school 
achievement in a group of children from 8to 14 years old is0'80. The correlation 
between the general intelligence test and age in the same group is 0°70 and the correlation. 
реке school achievement and age is 0°60. What is the correlation between general 

gence and school achievement in children of the same age ? 

Solution, We are given : 


Correlation between general intelligence test and school achievement, 
i.e. ra= 0'80. 


Correlation between the general it elligence test and age, i.e. ry3=0°70. 
Correlation between school achievement and age, i.e. re;=0'60. 


We have to find correlation between general intelligence and school achievement 
in children o the same age, i.e. we have to find r12.3- 


Criss- d LET RN 
^ 1-58 Ir, 
п $98—(07x0'6).- 038... 
ViceTvi-es 7 51-057 
Ilustration 14. Given the following information : 
713—920, r13=0°40, r55—0'50 
r14—0'40, rs4—0730, rs, —01 


rayo ER 


Tu 
м.ү 

We have to find first r41.3, 712.3 and 724.5 

ru—tura — __ 04—(04xX —0'1) 


Find r41.23. 
Solution. 


T41.3— 


Ji 0a i- 01? 
y! 1-2 i-g Vi-QO4P A 
_ "04:004 5 
- “Vou vos 7042 
Mig aa IR =02— Ax) 
— (0'4)? — (0° 
hn, л V1—(94):4/1— O'S? 
—02—02 
mm шу Д4 
уса убт = 0° 
T24.97— 7T24—r34 F23 0:3—(—0:1x0*5) 


yi аот 
4 


03-005 — 035 
4/0994/075 0'86 


=0'407 
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Substituting the values : 
0482— (— 0:504 x 0407) 
raam e 0:871 
41:23 Vi- 0:504)*4/ 1— (0:407)* 

Illustration 15, An instructor of mathematics wishes to determine the relation- 
ship of grades on a final examination to grades on two quizzes given during the semester. 
Calling Ху, X2 and Xs the grades of a student on the first quiz, second quiz and final 
examination respectively, he made the following computations for a total of 120 students. 


Xi-68 X,-TO = 
$,-10 S2=0'80 $,—9'0 
ғу =0'60 ғ13=0'70 To3=0°65 


(i) Find the least square regression equation of X; on X, and Xs. 
(ii) Estimate the final grades of two students who scored respectively 9 and 7, 4 


and 8 on the two quizzes. 
Solution, The regression equation of X, and X» and X; can be written аз: 


xy R(T) S) (xi Xo 


1—n2 
1718—28 ne? S bos 
«( н X S, Q6—X) 


12 
Substituting the given values : 
9:65—(0:7 x09 ү _9_ 
xw Ys )os-» 


0:7 — (065306) \ (. 9. oe. 
+t 7-6)" ya 68) 


cre (5599) (4) ae Coni) а-в 
X,—74—404(X2—7) 4-403606 —68) 
X, ,—74—40X3—28/28--436X,—29'65 
X,—1607--436X; 4045 
Final grades of students who scored 9 and 7 marks : 
X179, Х«=7 
Х3=16:074-4:36(9)-+4'04(7) 
—16:07--39:24--28/28 —83:59 or 84 
Final grades of students who scored 4 and 8 marks : 
Ху=4, X478 
X,—1604--4:36(4) +404(8) 
—16:04--17444-32:32— 65:80 or 66. 
Illustration 16, Calculate (а) R1-28 (b) Rss and (c) Ела for the following 


Xi-68 Xa-T04,—74 
S,-l0 $,-08 $,—9 
гз=0'6 гъӊ=07 го=0`65. 


data : 


Solution. 


Jeu AUS 5. 

ris rs 2r1s Газ Роз 

Ра кымга сызы 
1—7À 
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= [CFO — 0016077500568) 
v 1— (0:65) 
=, [03690490546 _ 0327-0 
КОЛЛ Г? = VU321-0726 
Tis rh 2rr 1з "23 


ELA LU шыва, 
Y 1—72 
= ПОТ)00:63)2—02х06х07х0'65) 

Mega, Ето) а 
- а = 0573=0'757 
1—0'36 


l 


Rye 


2 2 
ristis — 2712 Руз Р 25 
Erie ub CEN CONT HUNE 
1-7% 
= | OOF OSX 06x07 x065) 
1—(7)# | 
dis 10609225046 = 0960—0681. 


Illustration 17. Тһе foll 


, following constants are obtained from measurements on 
length in mm. (Ху), volume in c 


с. (5) and weight in gm. (X3) of 300 eggs : 


X,—55'95 $,—226 7130578 
Xa-5r48 $з=439 г1з=0`581 
®,=56`03 „=441 7350974 


Obtain the linear regression 


equation of egg weight on egg length and egg 
Коше Hence. estimate the weight of an egg whose length is 58 mm. and volume is 
"5 c.c. 


Solation, '1We have to obtain linear regression equation of egg weight on egg 
length and egg volume, i.e, Хз on X; and Xs. The regression equation of X, on Xs and 
X can be written аз: 


mappe Xs Jos» 
12 


Ti sra үр S, 
at =r, "Xs eo 


Substituting the values 
—56оз= | 9'974—(0'S81 x0'578)] 4417 3—5148 
*-seos-[ 1— (0578) 1( 439°) 02-5148) 
0581— (07974 x 07578) GE) —55" 
thy os | 226) (41-5595 ) 
Se ts) шш 441 s 
Ж s603- (0974—0235 C (Xs—5148)- 
0:581—0'563 4:41) 3 
1—0334 X 226 J 05595) 
35—75603—0964015—51:48)-10052(X;—5595) 
35—5603—0064X, 4963-0052, 797 
X3—354--0052X--0:964X, 
ы When length, i.e. Y, is 58 and volume, i.e. X; is 525 the weight of the egg would 
Y ¥3=3'54+0'052(58)-+0'963(52'5) 
7354--3016-5056—57:12 gm, 
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Tilustration 18. The table shows tne corresponding values of three variations Nn, 
Хз and Хз. Find the least square regression equation of X; on X; and Хз. Estimate 
Хз when X;—10 and X»—6. 


x 3 5 6 ДС 14 
% | 16 | 10 | 7 | 30173 
ж | so | m fsa | az | зо |а 


1 


Solution. The regression equation of Хз оп Xə and X; can be written as 


ws: 
»-®,( ы je») (%—®,) 
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Calculating X1, X», Xs, 51, S2, S3, r12, ris, rag 
X, (0-Х) xi Xa (5—0 3) xs Хз (X4— Xi) x3 хүха Хх хаха 


х Xa Xa 
3 —5 2516. 49. 81 90 40 1600 -—45 —200 360 
5 —3 ТОЗ 920272722 484 9—66. 6 
6 —2 4 du. 70 0 54 4 16 0.8110 
8 0 0 4 —3 95:542  —8 64 0 0 24 
12 4 16 Suc 1630 —20 400 —16 —80 80 
14 6 36 ZR 25 12 —38 1444 —30 —228 190 


XX, Хх Ix) УХ; Ух; Уха? УХ; Ух, — Xx! Ухул; xixa Ухх 
—48 -0 =90 42 =0 140 —300 0 4008 =100 ——582 =720 
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Xg—50=2'546(Xg—7)—3'664(X1—8) 
3,—50—2:546X,—17:822—3664X; +-29°312 
Х3=2:546Х3—3'664Х1+61'49 
When Хү=10 and Хз=6, X; will be 
Хз=15'276—36`64+61`49=40`126 or 40. 
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Statistical Decision Theory 


The businessman has to operate in an atmosphere of uncertainty 
and has to select the best course out of several alternative courses of 
action that may be available to him. In earlier days decisions were rnade 
purely on personal judgment. However, these days judgment is combined 
with several quantitative techniques and the best action is arrived 
atina given situation. The tests of hypothesis procedures presented 
in earlier chapter, were designed to test a statistical statement about 
a population (the null hypothesis) given а level of significance. We had 
two alternative courses of action, and our main ta:k was to establish 
some criteria (decision rule) for choosing between these two acts We 
reachcd a decision based upon an event (sample evidence) evaluated in 
the light of our criteria or decision rule Our conclusions and course of 
aetion were based upon the so-called objective interpretation of a probabi- 
lity stated in terms of relative frequency distribution. Our interest was 
focused upon the course of action while the test related to the null 
hypothesis. 

А supplementary analysis of these problems is provided by the 
subjective approach to the application of probability theory. The sub- 
jectivists or personalists belong to the Bayesian school and they regard 
probability as a measure of personal belief in a stated proposition. The 
Statistical decision theory is based on the Bayesian approach. In this 
ahapter a brief description of statistical decision theory is made Given a 
problem situation where there are available alternative courses of action 
each of which may lead to a set of mutually exclusive outcomes associat- 
ed with certain probabilities, which course of action should a decision- 
maker take ? This is a problem of statistical decision theory. 


In any organisation the main function of the executive is to make 
decisions. (he organisation is faced with several types- of decision 
problems such as should a new product be introduced in the market. 
How many units ofa product should be produced? How many units 
of a product should be marketed? How many units of a particular 
machine part or raw material should be kept in stock? The decision 
maker has to face such endless problems. In each of such decision- 
Eroblem, there are certain common elements which are called ingredients 


of decision problems. These ingredients are : 
SM—1ir77—A 
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First, decision is needed in a problem situation where two or more 
alternative courses of action are available, and where only one of these 
actions can be taken. Obviously, if there is only one course of action 
available, no decision is required since that action must be taken in order 
tosolve the problem. For sake of simplicity the possible actions are 
symbolised by dp, @1, аз. аз -..... etc The totality of all possible actions is 
called action space denoted by A. If there are only three possible actions, 
we write А =action space—í(a;. аз, a) Тһе decision procedure involves 
selecting among the alternatives a single course of action that can be 
actually carried out. Ifsuch a course of action is selected that cannot 
be carried out in the existing situations and circumstances, it will tant- 
amount to waste of time and resources. Quite often the objective of 
decision is to select an act which will accomplish some predesignated 
purpose. The decision taken may be regarded as satisfactory or not 
depending upon whether it has helped in the attainment of that 
objective. 


Secondly, in all decision problems “uncertainty” is found to be a 
common element. When the outcome of some action is not known 
in advance, the outcome is said to be uncertain. When there are many 
possible outcomes of an event (also called states of nature) one cannot 
predict with certainty what will happen — itis only in terms of probabi- 
lities we may be able to talk. The various states of nature (outcomes) are 
symbolised by 6;, Ө, 6,, ...... etc. The totality of all outcomes is called 
nature space or state space symbolissed by О. If an action leads to three 
outcomes 6,, 0, and 0, then we write : 


0—(8, 6, 83). 


For example, if a product is marketed it may be highly appreciated 
(outcome 9,), it may not appeal to the customers (outcome 0,) or it may 
be liked by a certain fraction of the customers say, 25% (outcome 6). 


Lastly, a number of consequences result from each action under 
different conditions, the conditions being various states of nature In 
general if there are т possible acti ns and n‘ admissible states of nature, 
the consequences will be nxm in number In practical situations parti- 
cularly in business and economic problems consequences can be expressed 
in terms of money and utility. 


The consequences may be evaluated in several ways such as : 
(i) in terms of profit 

(ii) in terms of cost, and 

(iii)*in terms of opportunity loss. 


In most decision problems the expected monetary value is used as 
a decision criterion. When consequences are evaluated in terms of profit, 
they are Called payoffs А payoff table is prepared and it shows the 
relation between all possible states of nature, all possible actions and the 
values associated with the consequences. A specimen of payoff table is 
given below : 
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Annxm Payoff Table 
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Inthe above table, the column headings designate the various 
actions out of which the decision maker may choose while the row head- 
ings show the admissible states of nature under which the decision 
maker has to take decision, The cell value py shows payoff resulting by 
taking action a, when the state of nature is 0; for all k=1, 2, wy 
айда“ 2 anes any т. A payoff table represents the economics of a 
problem—a problem of revenue and costs. А payoff may be thought of 
asa conditional value or conditional profits (losses). It is conditional 
value in the sense that associated with each course of action there is a 
certain profit (or loss), given that a specific state of nature has occurred. 
A payoff table thus contains all conditional values of all possible com- 
binations of actions and states of nature. 


The calculation of payoff depends on the problem. Very often it is 
а relatively easy matter and sometimes a bit of algebraic reasoning is 
required. Payoffs or conditional profits, т, for demand less than quantity 
produced (D«Q) can be considered as the difference between total 
revenue cf sales and total costs of production. Total profits, т, for a total 
demand equal to or greater than the quantity produced (D>Q) can be 
computed as total revenues from sales of all that was produced minus total 
costs of producing that quantity. 

With the payoff table, the decision-maker may be ableto reach the 
optimal solution of a problem if he has a knowledge of what event is going 
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First, decision is needed in a problem situation where two or more 
alternative courses of action are available, and where only one of these 
actions can be taken. Obviously, if there is only one course of action 
available, no decision is required since that action must be taken in order 
to solve the problem. For sake of simplicity the possible actions are 
symbolised by dg, а, de. Ag =: etc The totality of all possible actions is 
called action space denoted by A. If there are only three possible actions, 
we write A =action space—ía,. аз, аз} The decision procedure involves 
selecting among the alternatives a single course of action that can be 
actually carried out. If such a course of action is selected that cannot 
be carried out in the existing situations and circumstances, it will tant- 
amount to waste of time and resources Quite often the objective of 
decision is to select an act which will accomplish some predesignated 
purpose. The decision taken may be regarded as satisfactory ог not 
depending upon whether it has helped in the attainment of that 


objective. 


Secondly, in all decision problems “uncertainty” is found to be a 
common element. When the outcome of some action is not known 
in advance, the outcome is said to be uncertain. When there are many 
possible outcomes of an event (also called states of nature) one cannot 
predict with certainty what will happen —itis only in terms of probabi- 
lities we may be able to talk. The various states of nature (outcomes) are 
symbolised by 6, 05, 63, «+--+ etc. The totality of all outcomes is called 
nature space or state space symbolissed by ©. If an action leads to three 
outcomes #,, 0; and 9, then we write : 


N= (6, ba, 863. 


For example, if a product is marketed it may be highly appreciated 
(outcome 01), it may not appeal to the customers (outcome 0,) ог it may 
be liked by a certain fraction of the customers say, 25% (outcome 84). 


Lastly, a number of consequences result from each action under 
different conditions, the conditions being various states of nature Ta 
general if there are m possible acti ns and s admissible states of nature, 
the consequences will be nx m in number Та practical situations parti- 
cularly in business and economic problems consequences can be expresse! 
in terms of money and utility. 


The consequences may be evaluated in several ways such as : 
(i) in terms of profit 

(ii) in terms of cost, and 

(iii) in terms of opportunity loss. 


In most decision problems the expected monetary value is used as 
a decision criterion. When consequences are evaluated in terms of profit, 
they are Called payoffs А payoff table is prepared and it shows the 
relation between all possible states of nature, all possible actions and the 
values associated with the consequences. A specimen of payoff table is 
given below : 
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Annxm Payoff Table 
N. Actions 

DN. 

NM a a а, UU Дл, NUES te i 
States М А ы ў ү EU 
of nature N 

$i pu | P12 | їз Pir Pim 

9» P21 | P22 | Pes Pok Pom 

Өз Psi | Рэз | Pss Par Рет 

9; рі | Різ | Риз Ріт 

] | E 
Pam 


95 Рт | Рт? | Раз 


Inthe above fable, the column headings designate the various 
actions out of which the decision maker may choose while the row head- 
ings show the admissible states of nature under which the decision 
maker has to take decision. The cell value pix shows payoff resulting by 
taking action a; when the state of nature is б, for all k=1, 2, 3, ...... m 
and i=l, 2, 3, .n. A payoff table represents the economics of a 
problem—a problem of revenue and costs. A payoff may be thought of 
as a conditional value or conditional profits (losses). It is conditional 
value in the sense that associated with each course of action there is a 
certain profit (or loss), given that a specific state of nature has occurred. 
A payoff table thus contains all conditional values of all possible com- 
binations of actions and states of nature. 


The calculation of payoff depends on the problem. Very often it is 
arelatively easy matter and sometimes a bit of algebraic reasoning is 
required. Payoffs or conditional profits, т, for demand less than quantity 
produced (D<Q) can be considered as the difference between total 
revenue cf sales and total costs of production. Total profits, m,for a total 
demand equal to or greater than the quantity produced (2270) can be 
computed as total revenues from sales of all that was produced minus total 
costs of producing that quantity. 


With the payoff table, the decision-maker may be able to reach the 
optimal solution of a problem if he has a knowledge of what event is going 
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to occur. Since there is uncertainty about occurrence of events, a 
decision-maker must make some prediction or forecast usually in terms of 
probability of occurrence of events. With the event probabilities assigned 
the last step of statistical decision theory is to analyse these probabilities 
by calculating expected payoff (EP) or expected monetary value for each 
course of action. The decision criterion here is to choose as the optimal 
act, O4, the act that yields the highest EP. 


An alternative decision criterion of statistical decision theory is what 
is called expected opportunity loss, HOL This criterion also leads to the 
same result as obtained from expected profits (EP). Robert Schlaifer defines 
the opportunity loss of an action or decision “аз the difference between the 
cost or profit actually realised under that decision and the cost or profit 
which would have been realized if the decision had been the best one 
possible for the event which actually occurred”. Thus opportunity loss 
represents the amount of profit that was lost because the most profitable 
action was not selected. 

Calculations of ЛОГ, are the same as those for EP except for the fact 
that we have to use conditional opportunity loss (COL) instead of payoffs. 
It may be pointed out that COL of the optimal act is zero, COL of any 
act other than the ОА is positive and is the difference between the payoff 
of the OA and the act taken. 

If we replace the payoffs by their corresponding opportunity losses 
we get a new table called loss-table. Ifl, is the opportunity loss resulted 
by taking action a, when the state of nature is Өү, li satisfies the 
relation lema Pi: -Pix for all i—1, 2, ......, n and k=l, 2, ...... m. 


Illustration 1, A baker produces a certain type of special pastry 
at a total average cost of Rs 3 and sells it at a price of. Rs, 5. This pastry 
is produced over the weekend and is sold during the following week ; 
such pastry being produced but not sold during а week's time are totally 
spoiled and have to be thrown. According to past experience the weekly 
demand for these pastries is never less {һап 78 or greater than 80. You 
ане to formulate action space, state space, payoff table and loss 
table, 


Solution. Itis clear from the problem given that the manu- 
facturer will not produce less than 78 or more than 80 pastries. Thus 
there are three courses of action open to him ; 


a,=produce 78 pastries 

а= 55 79 b 

= (11.0 "guam 
thus the action space ог A— (a,, a, аз} 


The state of nature is the weekly demand for pastries. There are 
three possible states of nature, t.e., 


6,=demand is 78 pastries 
570 cis 
= » .» 80 ,, 
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Hence the state s pace О = {0,, 6, 0,}. 


The uncertainity element in the problemis the weekly demand. 
The bakery profits are conditioned by the weekly demand. Cell values 
of payoff table are computed as follows : 


Pu=payoff when action a, is taken but the state of nature is 0, 
=Rs [5x78—3x78]- Rs. 156. 

рза = payoff when action a, is taken but the state of nature is б, 
—Rs.[5x78—3x 79] - Rs. 153. 

$1s7 payoff when action аз is taken but the state of nature is б, 
=Rs. [5x 78—3 x 80] - Rs. 150. 


Similarly рз — payoff when action a; is taken and the state of nature 


=Rs. [5 x 78—3 x 78] —Rs. 156. 
Da; —Rs. [2x 79—3 x 79] - Rs. 158. 
pog — Ёз. [5 х 79—3 x 80] - Rs. 155. 


Similarly рз —payoff when action a; is taken and the state of nature 
is 0, 
=Rs. [5x 78 -3 x 78] - Rs. 156. 
рз = Rss. [5 X 79—3 x 79] — Rs. 158. 
Pa= Rs. [5 X 80 —3 x 80]— Rs. 160. 


These values are tabulated below : 
PAYOFF TABLE 


№ Р) 

SEN Là p 2 
ЕД 156 153 | 150 

| Ba 156 158 | 155 

| өз 156 158 | 160 


To calculate opportunity losses, we first calculate max рь, max Pox 
and max pss. 
max p,,—156, max 22158, max 3577160. 
14,7-156—156—0, L,,77156—153—3, L4, —156— 150€ 
Ly, =158—156=2, D,4—158—158—0, L,,158—155—3 
L3,—160— 156- 4, Lp 160—158 = 2, Lgg= 158 —158=0 
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The loss table corresponding to payoff table is given below : 


LOSS TABLE 

| N Actio" | 
| PON а аз аз 
Д nainreN 
| | 
|е, 0 з Uy 
Чыт AERE pam УЗА | 
| ba 2 | 9 | 3 | 
| | | Б 

| 2 | 0 | 
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Illustration 2. Suppose in a process of producing bulbs the 
number of producing defective bulbs is constant but unknown. Consider 
a lot of 100 bulbs which is either to be sold at Rs. 5 each giving a double 
money back guarantee for each defective item or to be junked at a cost 
of Rs 100 for the lot. Construct action space, state space, and payoff 
table. 


Solution. Two possible actons are open to the manufacturer, 
namely, 


a,=Junk the lot at a loss of Rs, 100 for the lot. 


а= Sell the lot at Rs. 5 each giving double money back guarantee 
for eacb defective item. 


Thus, 4 —í(2,, а,,} —action space. 


The number of defective articles їп the lot designates the states of 
nature. There аге 191 possible states of nature Let 0; denotes that there 
are ‘i’ number of bulbs that are defective in the lot. Thus 

OQ — (6, br ..., 0,00) 
or O=({0, 1, 2, 3, ..., 100). 


Inthe problem given it is very difficult {о work out all possible 
payoffs since there are 101 states of nature. Therefore, we determine the 
functional form of payoffs. Ifthe manufacturer takes action a, he loses 
Rs 100 and itis his loss. Hence p4——100 for all i—0, 1, 2, ..., 100. 
If he takes action a; he may lose or gain depending upon the number 
of defective bulbs in the lot because he is providing double money back 
guarantee. Ifmore than 50% bulbs are defective in the lot, his loss is 
inevitable otherwise he makes profit, In this case we can write payoffs 
as: 

Pig=100 x5—0:/x 10 
22500 —10 6, for all i=0, 1, 2, ..., 100 
Thus his payoffs are 


Pa=—100 fag 
Pig=500—10 6, js all i=0, 1, 2, ..., 100 
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Optimal decisions. After we have been able to prepare a payoff 
or loss table it is easy to arrive at a decision and to select a best course of 
action. It would be desirable to choose that action which maximises 
payoffs or minimises opportunity losses for all states of nature. I! such 
action exists, the action is called uniformly best action. Unfortunately, 
such uniformly best action exists very seldom, because a particular action 
may be best for some states of nature and the same action be worst for 
the remaining states of nature. For insiance consider illustration 1 where 
it is clear that the action аз is best when the state of nature is 0з. On the 
other hand, the same action is worst when the state of nature is 0, and 
none of the actions is found to be uniformly best. Similar situation exists 
in illustration 2. Hence, the decision-mzker needs some criterion or 
principle for making a choice amongst alternative actions. We shall 
mention here some important principles or criteria to select an optimal 
action Ап action is said to be optimal if its payoff is best (as large as 
possible) according to the criterion or principle under consideration. 


1. The Maximin Principle. Ofall the principles the simplest is 
the maximin principle for determining an optimal action when conse- 
quences are given in term of profits. According to this principle the 
decision-maker first observes the minimum payoffs over the various 
possible states of nature Then he selects that action for which the mini- 
mum payoff is maximum. This principle places a value on each action 
according to the worst that can happen with that action in a. sense, decis- 
ion-maker expects worst and prepares for it. Symbolically the maximin 
action is that which maximises the min pix. Let us apply this principle 


1 
to Illustrations 1 and 2 for finding optimal action. For Illustration 1, the 
minimum payoffs over the various states of nature are given below : 


min pa = minimum payoff for fixed action a,—156 
i 


min pj minimum payoff for fixed action dg=153 
i 


min p;,=minimum payoff for fixed action dg=150 


is maximum for action a, Hence the 


These minimum. payoffs Í 
maker to take action a. The maximin 


principle suggests the decision- 
action is a,— produce 78 pastries. 
In Illustration 2, the minimum payoffs over the various states of 
nature are given below : 
min pa=— 109 n | 
i == Es inimum value occurs 
min Piz 500 [ Ben 9, 000] 


i ini offs is maximum for action d, (since— 1002 — 500). 
Clearly the minimum pay mk the lt at a cost Dee ey 


Hence the maximin action 25, % 


for the lot. ^ 
The maximin principle discussed above usually requires a consi- 
derable amount of computation. Furthermore, a serious objection is that т 
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is generally too pessimistic since the decision-maker chooses that state of 
nature for which his payoffs are minimum. 


2. The Minimax Prirciple, This principle is used only when 
Consequences are given in the form of opportu ity losses which is same as 
loss table. The decision-maker first observes the maximum opportunity 
loss over all the various states of nature. He then selects that action for 
which the maximum opportunity loss is minimum. This principle gives 
the greatest possible protection against the largest loss, Symbolically, the 
minimax action is that which minimizes the maxium lix. 

i 


Let us apply this Principle to Illustration 1 for finding optimum 
action. In the illustration the maximum losses over the various states of 
nature are given below: 


max /4,=4, max [,—3, max 142-6 
i i i 


Clearly for action 4, maximum loss is minimum. Hence this principle 
suggests to take action aş. 


3. The Baye's principle. The Bayes principle for the selection of. 
ап optimal action derives its name from the 18th century philiosopher 
1 Thomas Bayes, who first suggested and investigated the notion of “inverse 
probability" or “subjective probability." The disciples of Bayesian 
School regard probability as a measure of the degree of personal belief and 
they prefer to be named as subjectivists. On the other hand, the disciples 
of non-Bayesian school consider Probability as a long run relative fre- 
quency and they prefer to be labelled as objectivists. One major advantage 
of the Bayesian approach is that the decision-maker selects a course of 
action on a rational basis since he uses subjective evaluation of probability 
based on experience, past performance, judgment, etc. In the following 
few pages the Bayesian statistical decision theory is briefly explained. 


To make use of the Baye’s Principle in statistical cecision problem, 
the decision-maker must be able to 


assi 
nature. The sum of these probabilities must add to опе. These pro- 


After determining the prior distribution, the Bayes principle is to be 
used Phasewise. The three phases in order of their Occurrence are : 
(1) prior analysis, (ii) Preposterior analysis, and (iii) posterior analysis. 
A brief description of each of these is given below : 


(i) Prior Analysis. Once the relevant prior distribution of various 
states of nature is found out, the decision-maker needs to compute the 
expected payoff (abbreviated EP) or expected opportunity loss (EOL) for 
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each action, The ЕР for action a; denoted by В(0;), is computed by the 
formula : 
B(a)—Z py g(&) for allj 
i 
If in this formula payoffs are replaced by corresponding opportunity 


losses, we get HOL for action a;. The decision-maker chooses that action 
for which EP is maximum or ZOL is minimum. 


Consider illustration l. Suppose the manufacturer by some means 
finds the following prior distribution. 


PRIOR DISTRIBUTION 


States of nature €;— Demand 6,—78 9—79 03-80 | Total 


Prior probabilities=g(9:) 03 | 04 03 10 


на Cp e pl 
The expected payoffs are computed as follows : 

B(a,)=Rs. [156 x 03 4-156 x 0°4+156 x 03]— Rs. 156-00. 
В(а,) = Rs. [153 x0°3+ 158x0:44-158 X0 3]= Rs. 15650. 
B(ag) = Rs. [150x 034-155 x0:4--160 x 0:3] — Rs. 155°60. 


Action ø, represents the most attractive ЕР. Hence producing 79 
pastries is preferred to any other action. 

Instead of profits we can take opportunity losses in the above 
analysis. If we do so, the conclusion will not be altered unless the prior 
distribution is changed. In general, both EOL and EP lead to the same 


conclusion. 


(ii) Preposterior Analysis. After making prior analysis the decision- 
maker must decide either to collect additional information regarding the 
states of nature or to take the action as suggested by the prior analysis. 
Prior distribution is not always a perfect predictor regarding the states of 
nature, This is more so in business decision problems. However, if 


somehow, the de.ision-maker finds a perfect predictor, he would prefer 
ld enable him to maximise 


actions based on perfect predictor for it wou 
his profits or minimise his loses. The highest expected profits resulted 
in the presence of perfect predictor is called the expected payoffs of perfect 
information (EPPI). ЕРР1 is often called the expected value of рауо 
under certainty. The perfect prediction reduces the opportunity losses 
due to uncertainty to zero. The highest payoff in the absence of perfect 
predictor is EP of the optimal action. The difference between EPP1 and 
EP is called the expected value of perfect information (abbreviated ZVPI). 
EV PI represents the maximum amount of money which a decision-maker 
could spend to obtain additional information regarding the states of 
nature. [t may be noted that EVPI is always equal to the EOL of 

imum action under uncertainty. The identity EP+#OL 


selecting the opt 
—EPPÍ е. from the result EV PI-EOL ard EVPI=EPPI—EP. 
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The main objective of preposterior analysis is to determine whether 
or not it is profitable to gather additional information regarding the states 
of nature before taking the final action, Additional information may be 
gathered by conducting survey, by carrying out an experiment or by 
some other means, The objective of preposterior analysis is fulfilled by 
computing EVPI. If ВИРІ is relatively larger than the cost involved in 
gathering additional information, it is advisable to gather the additional 
information regarding the states of nature, 


Illustration 3. A businessman wants to construct a hotel. He 
usually builds 25, 50 or 10. bed hotel, depending on whether anticipated 
demand is low, medium or high. The businessman has been able to find 
out net profits which are expressed in the table below and the prior distri- 
bution regarding the states of nature which is given in the next table. 

PAYOFF TABLE 


K : ) 
AS | 
N Action 
N ai аз аз | 
i | 
States of N Build Build Build | 
nature NC | 25-bed hotel | 50-bed hotel | 100-bed hotel | 
АУ" | 
| 
0;—Low 20,000 —10,000 — 30,000 
demand 
9, — Medium 25,000 30,000 +5000 
demand R | 
0,— High 30,000 50,000 | 60,000 | 
demand | 
| 
CT au wu у онн =з е ысы! ш Mice ту". | 


PRIOR DISTRIBUTION 


States of nature | 
= Demand 9; Өз 6 


Prior probabil- 02 03 0:5 гоо 
ities 2001) 


(а) Compute EP, ZPPI and EVPI. 


(b) A research firm agrees. {о conduct a survey for Rs. 8,000 and 


provide him with information regarding the states of nature. Should the 
survey be conducted ? 


Solution, To compute EP we have to compute expected payoffs 
for each action under uncertainty as follow : 


В(а)= Кв. [20,000 X 0 2-1-25,000 0:3 +30,000 x 0:5] — Rs. 26,500 
(as) = К [—10,000 x0:2+30,000 x 0:34-50,000 х0 5]— Rs. 32,000 
В(а,)= Rs. [—30,000 X0:34-(—5,000) x 0:3--60,000:«0:5]— Rs. 19,500 


; 
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from the above computation itis clear that the highest expected 
payoff or profit is associated with action ay. Hence, the highest expected 
payoff under uncertainty =EP=Rs. 32,000. 


Next to compute EPPI we have to find out the highest payoff for 
each action under certainty. i.e, under the assumption that the perfect 
predictor is available. Clearly from the payoff table, when the state of 
nature is known to be 0, the businessman would take action a, asa result 
of which he makes his net profits 20,000. Similarly, when the states 
of nature are known to be 6, and 8, he corresponingly takes actions 
a, and аз by which he makes his net profit 30,000 and 60.000 res- 
pectively. The highest expected payoff under certainly is computed as 
EPPI=Rs. [20,000 x 0 2+ 32,000 x 073-1 60,000 xu: J= Rs. 43.000. The 
expected value of perfect imformation = EVPI=EPPI—EP=Rs. 
[43,000— 32,00: ]-- Rs. 11,000. The EVPI is relatively larger than the 
expenditure incurred in conducting a survey in order to collect further 
information regarding the stats of nature. Hence it is advisable to 
conduct the survey. 


(iii) Posterior Analysis. If itis decided to gather further informa- 
tion regarding the states of nature, the information to be gathered by 
conducting survey or by performing an experiment or by some other 
means the posterior analysis is used only after gathering ihe information, 
In posterior analysis relevant information is combined with prior infor- 
mation in order to make the degree of belief regarding the states of nature 
more stronger. In other words the decision-maker has to revise the prior 
distribution as a result of which he gets new distribution. of states of 
nature, called the posterior distribution. If the additional information 
is collected by conducting а survey the outcome of the survey would 
convert the prior distribution into the posterior distribution. The decision- 
maker will ultimately be interested in the expected posterior payoffs. We 
shall carry out the posterior analysis by taking the following example : 


Illustration 4. Consider example 3 in which the businessman 
collects additional information through this firm. The research firm 
provides information by conducting surveys which are used to obtain an 
estimate of either low demand (X,), medium demand (X,), or high 
demand (Xq) depending on the results of the survey. The reliability of 
these estimates provided by the research firm are presented in the follow- 
ing table : 

CONDITONAL PROBABILITIES 


States of Хі Xa Xs | Total 
nature | 
EODEM SE ot oF ТИНИ ТҮРЕ 

6, 0:60 025 015 | 100 

9: 015 0 60 025 | 100 
gs iet Meuse pere] 

СА 0°05 020 | 075 | 1:00 
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The cell values of the above table are conditional probabilities, 


condition being various states of nature ; for instance, the uppermost left 


when actually the demand is low. Hence Р, [X3 | 6,]—0:60. Similarly 
other cell values are interpreted. Now by using the formula Pi0,0X;] 
=Pi[X, | 6] Pi[6,] for all 1, j—1, 2, 3, the joint probabilities of each 
sample estimate and each state of nature are computed. These probab- 


JOINT AND MARGINAL PROBABILITIES 


LÀ LLEN лы CUN 
State of Prior Joint probabilities 
nature Probabilities ны 
| 
РӨ; гух) Prif 3X3) | Pr(8i Xs) 
im т; = 1 
в 02 | очо 0050 | со 
кш ш ЯҢ |- (i sj 
| 
9; 03 | 0:045 0:180 0:075 
93 0:5 | 0:025 0:100 | 0375 
Marginal probabilities 0:190 0370 | 0480 | 
| 
жес... аша ИШИМ 


The last row of the above table gives the marginal probabilities com- 
puted by the formula X Pr(6,0.X;) for all j=1, 2, 3, and it is denoted by 


Р:[Х,]. Next by using Bayes theorem we can combine the sample 
information witb the prior information to make the degree of belief re- 
garding the states of nature more Strong. Using Bayes theorem we will 
get new conditional probabilities regarding the states of narure and they 
are called posterior probabilities. The posterior probabilities are com- 
puted by using the following Baye’s formula : 


Plz) РС] 
Prio; | ^l Pre) “телег 
i 


_ joint Probability of 0; and T; 
‘marginal probability of z; 


for all 4, j=1,2,3. 


For example, Prio, | a] 


= PHAN) _ 0:120 
LS Pra) a iv 


=0°63 
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The posterior probabilities are shown in the following table. 


POSTERIOR PROBABILITIES 


States of nature | Р; | х1] Pr[8; | xo] Pr[0; | хз] 
6, A =0'63 эу” 15 Zoo 
ө, E =0:24 Fao 55 A0 16 
°з | 25. =013 39.=0:30 ды —0:78 
| 
Total 100 100 1:00 


Now we have to compute the expected payoffs or profits for each 
action under the assumption that the states of nature follow the posterior 
distribution or probabilities and such expected profits or payoffs are called 
expected posterior payoffs. The expected posterior payoffs are computed 
by the formula—Expected posterior payoffs for action а when sample 
estimate is 2, ="(ay | 2j) =Z Pr[0, | ж,}р for all k, j=1, 2, 3. The expected 

i 
posterior payoffs for each action at various levels of sample estimates are 
computed as given below : 
yla, | =) = ХР, | pa РЛ? | mpat Pri | zilpa + Pr(05 | 4)psy 
i 


—Rs.[20,000 x 0 63 4- 25,000 x 0:24 4-30,000 x 013] — Rs. 22,500 
yal 25) — ZPr[9; | zs] pia Рт. | #]рь„+Р{8, | Xp] Poo-+P,[O, | alpog 
i 


= Rs. [20,000 x 0:15-+25,000 x 0 55--30,000 x 0 30]=Rs 25750 


Similarly, 
yla, | 25) — Rs. [20,000 x 006 425,000 x 0°15 4-30,000 x 0'78] = Кз. 28600 
Yla | zı) -Rs. [— 10,000 x 063--30,000 x 0:244-50,000 x 0:13] 
=Rs. 7400 
ү(а, | 2,) = [ —10,000 x 0:15 4-30,000 x 0°55 +50,000 x 0°30] 
=Rs. 30,000 
(ag | 2) =з. [ —10,000 x 0:06++ 30,000 x 0 16 +-50,000 x 0°78] 
= Rs. 43,200 
Ү(а | 2) = Кз [ — 30,000 x0 63 — 5000 x 0:24 + 60,000 x 0 13] 


—Rs —12,300 
(a, | а) =Rs [30,000 :0:15—5,000 x 0 55 + 60,000 x 0730] 


—Rs 10,750 
Y | z) =Rs [—30,000 x 0:06—5,000 x 0 16+ 60,000% 0:78]—Rs. 44,209 
3 3) = ш 
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The expected posterior payoffs are to be tabulated so as to make 
comparison very easy. The following table gives the expected posterior 


pay-ffs : 


EXPECTED POSTERIOR PAYOFFS 


Outcome yla; | xy) | ya | xg) ] ylas | ху) 
x Rs 22,500 | Rs. 7,400 Rs. —12,300 
Xe 25,750 30,000 | 10,750 | 
хз 28,600 43,200 " 44,200 | 


The final step in this analysis isto make comparison between the 
expected posterior payoffs at each level of outcome If the outcome is ЕД 
(the demand is low) it is desirable to take action а, (build 25-bed 
hotel) since for a, the expected posterior payoff is highest when z, is obser- 
ved. The highest expected posterior payoff in this case is Rs. 22,500. 
Similarly, when т, is observed it is advisable to take action a, by which 
he makes profit Rs. 30,000. 


Finally when ту in observed it is profitable to take action 45 by which 
he makes a profit of Rs 44,200. 


Posterior analysis needs a lot of computation as wellas a sound 
knowledge of statistics. Therefore, students are advised not to use this 
analysis unless haviug a sound knowledge of statistics. 


MISCELLANEOUS ILLUSTRATIONS 


Mlustration 5. А proprietor of a food-stall has invented a new 
item о! food-delicacy which he calls WHIM He has calculated that the 
cost of manufacture is Re 1 per piece and that because of its novelty and 
quality it would be sold for Rs. 3 per piece It is, however, perishable, and 
any goods unsold at the end of the day are a dead loss. He expects the 
demand to be variable and has drawn up the following probability distri- 


bution expressing his estimates : 
No. of pieces demanded 10 11 12 13 14 15 
Probability ТОЛТО N E —) -ag — 1195-10 

(i) Find an expression for his net Profit or loss if he manufactures m 
pieces and only љ are demanded consider separately the two 

Cases n & m, n > m. 


(ii) Assume that he manufactures 12 pieces, Using the results in (1) 
above find his net profit or loss for each level of demand. 
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(iii) Using the probability distribution, calculate his expected net 
profit or loss if he inanufactures 12 pieces 
(iv) Calculate similarly the expected profit or loss for each of the other 
levels of manufacture (10 < m « 15). 
(v) How many pieces should be manufactured so that his net expect- 
ed profit is maximum ? (CA, 1974) 
Solution, (i) The proprietor does not produce more than 15 pieces 
of WHIM or less than 10 pieces. His profit is determined by the demand 
(n) and production (m). When the demand is more than the production, 


his profit shall be 
Rs 3xm—Rs l x n— Rs 2m (f n > т) 

When the production equals or exceeds the demand, his profit shall be 
Rs 3n—Rs 1 x m—Rs [3n —m] (if n is < m) 


PAYOFF TABLE 
У. Production Hn т | "T 
m 
N 
S 10 п 12 13 14 15 
S 
Demand N 
n NS 
10 | Rs 20 | Rs19 |58 | Rs 17 |8516 | Rs 15 
| | 
п BI Coal e dh 090 19 18 
12 | 20 | 22 | 24 | 23 2 2l 
; 13 20 | 22 | 24 | 26 25 24 
14 » | 2 | z | 2h 28 27 
ET 20 | 2 | pare, | % 28 30 


(ii) The third column of the payoff table given above shows the 


net profit for each level of demand: 


(iii) If he manufactures, 12 pieces, his expected profits will be as 


follows : 
ollows 97x 184-10x21 + 23x244:88X24 12x 24 10х24 
=Rs 3:28 
(iv) The expected profits for other levels of manufacture are calculated 
below : 3:20) + ( 23 X20) + ( 38 x 20) (12x20) 


Ж 20) (10 
B(a,) = Rs [( 07 x20)- C -p (103 20)]-- Rs 20 


2 y [97 9914-0 x22) 
Bia = R5 07x19) 4- 10x 22,4 (23x22) + ( 38 x 4-(1 
2 ( ) ( 1 | ii 2x22 
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B(a,\=Rs 3:28 [Calculated above] 
В(а,) = Rs [( 07 x 17) + (10 x 20)2-(:23 x 23) + (38 x 2€)-+ ( 12 x 26) 
+(10 x 26)]— Rs 24:08 
B(a;) = Rs [(07 x 16) + (10 х 19)--((23 x 22) 4-((38 x 25) 4- (12 x 28) 
+(010х28)] = Rs 23:74 
Bla) - Rs [C07 х 15)--(10х 18) - (23 x 21)--(38 х 24) 4- (12x 27) 
+(10X30)]=Rs 23:04 

(v) From the above calculations it is clear that he should manu- 
facture 13 pieces for maximising his expected profit. 

Hlustration 6. Under an employment promotion programme it is 
proposed to allow sale of newspapers on the buses during off-peak hours. 
'The vendor can purchase the newspapers at a special concessional rate of 
25 paise p г copy against the selling price of 40 paise. Any unsold copies 
are, however, a dead loss А vendor has estimated the following probabi- 
lity distribution for the number of copies demanded : 

Number of copies 15 16 17 18 19 20 

Probability 04 *19 0:33 0:26 Oll 0:07 

How many copies should he order so that his expected. profits will be 
a maximum ? 

Solution. The vendor does not purchase less than 15 copies or more 
than 20 copies. His profit is determined by the demand (D) and the 
number of copies purchased (Р). When the demand is more than the 
number of copies purchased by him profit will be : 


Rs 40x P—Rs 25x P eG) 
When the demand is less or equal to the number of copies purchased 
by him his profit is: Rs 40x D—Rs025x P + (ii) 


From (i) and (ii) we can make the following payoff table 
PAYOFF TABLE 


т 
S: P 
pw 15 16 17 18 19 20 
N | 
15 Rs 2°25 Rs 200 Rs 175 Rs 1°50 RsL25 Rs 1‘0 
16 225 2-40 215 190 | 125 | 140 
17 225 240 2:55 230 | res 1°80 
18 225 2:40 2:55 270 | 205 | 220 
19 225 2:40 2:55 270 | 245 | 260 
20 225 240 2:55 270 | 285 | $00 


=” 
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His expected profits are calculated below : 
B(a,)=[(2'25 x 04) +(2°25 X *19)+ (2°25 x*33)+-(2°25 X :26) 
4-(2:95 х'11)+(2"25 х :07)]- Rs 2:25 
B(a,)=[(2 x 04) +(2'4 х *19)+-(2'4X°33)+(2'4x :26)--(2:4xc11) 
--(2:4x:07)]— Rs 2:38 
В(аз) = [(1:75xx04)- (215X -19)(2:55 х :33)+(2 55 х`26) 
4-(2:55x:11)4- (2:55 х`07)] - Rs 2:44 
B(a4) -[(1:5x-04)-- (19 х :19)2-(2:3 33) - (877 х :26)4-(27 x11) 
4-(277 x:07)] 2 Rs 2°37 
B(as)=[(1°25 x04) + (1°65 x *19)4- (2:05 x 33) +(2°45 x 26) 
-E(2:85 x'11)2-(2:85 x '07)] — Rs 2°29 
B(as)- [(1:00 x :04)-- (1:40 x * 19)+ (1°80 x :33)-- (2:20 x 26) 
-E(2:60 x*11)2- (8:00 х '07)1= Rs 1:97 


tthe expected profit is 


It is clear from the above calculations tha 
r 17 copies in order 


maximum in the third action. Hence he should orde 
to maximise his profits. 
EXERCISES 
1. Explain clearly the various ingredients of a decision problem. 
2. “Uncertainty in a statistical decision problem is inevitable." 
Justify the statement giving suitable examples. 
3, Explain clearly the followings : 
(i) Action space, (ii) State of nature, 
(iii) Payoff table, (iv) Opportunity loss. 
4, Explain the followings giving a suitable example : 
(i) The minimax principle, (i) The maximin principle, (iii) The 


Baye’s principle, (iv) Expected value of perfect information, (v) Highest 
expected payoffs with perfect information, (vi) Highest expected payoffs 
under uncertainty. 

5. What is the most difficult aspect of decision-making ? Why does. 
this difficulty arise? 

6. A baker makes a cer! 
next day. It is perishable an 
The unit cost and price of the pastry 
According to the past experience, the 
probabilities are : 

Demand 20521 Day VE IIE SLD 


Probability : 91 02 03 03 O1 
(i) Formulate the action soace and state space. 
(ii) Construct the payoff table. 


tain kind of pastry at night and sells it the 
d must be thrown if not sold during the day. 
are Rsl and Rs 3 respectively. 
daily demand and the respective 
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(iii) Construct the loss table. 

(iv) Determine the maximin and minimax action. 

(v) Determine the Bayes' action. : 

(vi) Compute the highest expected payoff with perfect information. 

7. A certain output is manufactured at Rs 50 and sold at Rs 75 
per unit. The product is such that if it is produced but aot sold dur- 
ing a week's time it becomes worthless. The weekly sales records in the 
past are as follows : 


Demand per week : 20 21 22 23 
No. of weeks each sales 
level was recorded : 200 350 800 150 


(i) Calculate the expected sales of the month. 
(ii) Prepare a table of payoffs for different possible acts. 
(iii) Prepare a table of expected payoffs and select the optimal act. 


8. Astallata certain subway station sells for 40 paise a eopy of 
daily newspaper for which it pays 25 paise. Unsold papers are returned 
for a refund of 5 paise a copy. The daily sales and corresponding pro- 
babilities are as follows : 


Daily sales : 500 600 700 
` Probability =; 05 0:3 $92 
(i) How many copies should it order each day ? 


" ""'(W) If unsold copies cannot be returned and are useless, what 
should be the optimal order each day ? 


| 9. To promote the sales of a new magazine the publishers supply 
it to vendor at a price of Rs. 1:50 and the printed price is Rs 2:50, i.e., 
the vendor gets гарее one on each magazine sold. Itis a monthly 
magazine and as such unsold copies during the month. are a dead loss. 
А vendor has estimated the following probability distribution for the 
number of copies demanded : 


No. of copies 40 41 42 43 4 45 
Probability *08 16° 21—250 3970: 5213 11 


How many copies should he order so that his expected profits will 
be a maximum ? 


10. Two items of value Rs. 240 and Rs. 300 are to be bid for simul- 
taneouly (with sealed bids) by 4 and B. Both bidders announce their 
intention of devoting a total of Rs. 330 to the two bids. If each uses a 
minimax criterion, what are the resulting bids ? 

3 (Simla, M.B.A., 1974) 

11. Describe some methods which are useful for decision-making 
under uncertainty. Illustrate each by an example. 

(Chandigarh, M.B.A., 1977) 


SECTION 1 
STATISTICS—WHAT AND WHY 


1. (a) What is Statistics ? Discuss its scope and limitations, 
(B. Com., Meeruth, 1968) 


(b) Define the term ‘statistics’ and discuss its functions, 

(B. Com. Pass , Delhi, 1972) 
(c) Discuss the importance of Statistics in business. 

(B. Com. Pass, Delhi, 1974): 


2. (a) Discuss the use of statistical methods in business pointing out their 
limitations. (M.B.A., Delhi, 1969) 
(b) Discuss briefly the role of statistical methods in economic planning 

with special reference to India. (В.А. Hons. Econ , Delhi, 1972) 
3. Examine critically a few of the important definitions of Statistics and 

state the one which you consider to be the best. (B. Com., Osmania, 1969) 


4. “Statistics can prove anything.” 
“Figures cannot lie.” 


Comment on the above two statémets, indicating reasons for the existence 


of such divergent views regarding the nature and functions of Statistics. 
, (B. Com., Madras, 1968) 


5. Define ‘Statistics’ and point out the main difficulties that a statistician 
has to face as compared with a physicist or chemist. (B. Con:., Andhra, 1967) 
6. Comment on the following statements : 


(a) “Statistics 15 the science of averages.” 


(b) “Statistics is the science of counting."" ш, JR 
abilities. 


(©) “Statistic” е science of estimates ал ор нг, 1975) 


7. “Statistics dy of methods for making wise decisions in the face 
of unceriainty."—Wallis and Roberts. Elucidate. 

8. “Statistics is the science of estimates and probabilities.” Elucidate the 
above statement and give a more comprehensive definition ef the science of 
(B. Com., Andhra, 1969) 


statistics. 
9. "Statistics 15 the science of averages." Do youagree with this view ? 
If not, give reasons and suggest a proper defiaition Ж (B. Com., Nagpur, 1968) 


10. Examine critically the important definitions of statistics poioting vut 
the one you think best. (B. Com., Nagpur, 1970) 


. 11. (a) Write an essay оп "Statistics in the service of trade and 
commerce. 


‚ (b) Explain and illustrate the use of statistics. for economic analysis 
and planning. (M. A. Econ.,Meeruth, 1974) 


12. Descibe briefly the different kinds of statistical methods and explain 
their usefulness to economists and businessmen. (B. Com., Madras, 1968) 


13. Dicuss the usefulness of statistics to the State, the economist, the 
industrialist and the trader. (B. Com., Osmania, 1967 ; B. Com., Bangalore, 1969) 


14. “Statistics are like clay of which you can make a God or Devil, as you 


please.” 
In the light of the above statement, discuss the uses and limitations of 
Statistics. (B. Com., Part II, Nagpur, 1969) 


15, Define Statistics and show how it can help the extension of scientific 
knowledge, the establishment of a sound business and the formulation of a plan 
for national economic development. 

(B. Com., Part II, Bangalore, 1968 ; B. Com., Punjab, 1970) 


SMRE—10°77-1 


R-2 STATISTICAL METHODS 


16. ‘Statistics affects everybody and touches life at many points. It is 
both a science and an агі.” 


Explain the above statement with suitable examples. 
(B. Com., Nagpur, 1968 ; B. Com., Rajasthan, 1974) 


17. Discuss the importance of the study of Statistics and show how it can 
help the extension of scientific knowledge, the establishment of а sound business 
and the introduction of social and political reforms. (Bl Com., Mysore, 1968) 


18. (a) “Statistics is all-pervading."" laborate. 
(b). “Statistics is what statisticians do." Examine critically, 
19. “Statistics are numerical statements of facts but all facts numerically 


stated are not statistics." Comment upon the statement and state briefly which 
numerical statements of facts are not statistics. 

/ (В. Com., Allahabad, 1970 ; B. Com. Punjab, 1974) 

20. *'Statistics only furnishes a tool, necessary, though imperfect, which is 


dangero"s in the hands of those who do not know its use and deficiencies.” 
—Bowley. Discuss. (В. Com., Bombay, 1969) 


. 21. Writea critical note on the limitations and distrust of Statistics. 
Discuss the important causes of distrust and show how Statistics could be made 
more reliable. (B Com., Osmania, 1966) 


22. “The science of statistics is a most useful servant but only of great 


value to those who understand its proper use,"—King. Comment on this 
statement, (B. Com., Kerala, 1968) 


23. “It is sometimes said that statistics are used the way a drunkard uses а 


lamp-post ; for support, rather than for illumination.” Discuss. 
(B. Com., Madurai, 1969) 


24. Statistical methods are most dangerous tools in the hands of the in- 
expert. Statistics is от of those sciences whose adepts must exercise the self-res- 
traint of an artist." " lain fully the significance of the statement. 

(B. Com., Allahabad, 1966) 

25. ‘There are three kinds of lies : Lies, damned lies and statistics.’ Com- 
ment on this statement and point out the limitations of the science of stat'stics. 

(B. Com., Andhra, 1969) 

26. “Аге statistical methods likely to be of any use toa business firm ? 
Illustrate your answer with some typical büsiness problems and the statistical 
techniques to be used there. (M.B.A., Delhi, 1967) 

27. Discuss how in modern times statistics is the science of human welfare. 

(B. Com., Delhi, 1971) 

28. “Planning without statistics is a ship without rudder and a compass.” 
їп the light of this statement explain the importance of statistics as an effective 
aid to national planning in India. (B. Com., Nagpur, 1969) 

29. Statistics arose from practical requirements of problems in various 
spheres and its importance is due to its use in treating such problems, Discuss 
giving suitable examples. (M.A. Econ., 1970) 

30. Discuss the utility of Statistics to the State and the industrialist. 

(B. Com., Mysore, 1970) 


31. (a) What are the characteristics and limitations of Statistics ? 
(b) Explain with illustration the use of Statistics in business and 
industry. (C.A., 1970) 
32. "'Science without Statistics bear no fruit and Statistics without science 
have no roots.” Explain the above statement showing the relationship of Statistics 
with some other sciences. (В. Com., Nagpur, 1971 М.А. Econ., Jabalpur, 1973) 
33. “Statistics is to economics what a few grains of boiled pot rice is to the 


cook : only the latter sees his universe also," Discuss giviog suitable examples. 
М.л. Econ., Meerut, 1972) 
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34. (a) Explain clearly the three meanings of the word ‘statistics’ con- 
tained in the following statement : М 
**You compute statistics from statistics by statistics.” 
(b) Explain clearly the meaning of the statement : 
“Not a datum, but the data are the subject-matter of statistics,” 
(B. Com., Agra, 1968) 
35. Statistical methods include all those devices of analysis and synthesis 
by means of which statistics are „scientifically collected and used to explain ог 
describe phenomena, either in their individual or related capacities,” Secrist. 
Elucidate the statement. (M. A., Jodhpur, 1966) 
36. Reconcile the following statements : 


(a) “Statistics can prove anything.” 


(b) “Statistics prove nothing.” [B. Com., Nagpur, (Supp.), 1971] 
37. What are the main limitations of statistics? Can these shortcomings 
be overcome ? (B. Com., Raj., 1973) 


38. “Statistics are the straws out of which I, like every other economist, 
have to make the bricks.”—Marshall. Elucidate the above statement. А 
(M.A., Econ.,Agra, 1973) 
39. Distinguish between Statistics and Statistical Methods, Discuss the 
scope and significance of the study of statistics. 
(M, А., Есоп., Punjab, 1973 ; B. Com.,Panjabi, 1973) 
40. Whatis statistics ? What areitslimitations ? Give suitable illustra- 
tions in support of your answer. (B. Com., Delhi, 1973) 
41. Without adequate understanding of statistics, the invesrigator in social 
sciences may frequently be like the blind man groping in a dark closet fora black 
that is not there." Comment, Can the statement be extended to the field of 


natural sciences also ? LB.A. (Hons.) Econ., Delhi, 1973] 
42. Illustrate with suitable examples the use of Siatistical Methods in 
Commerce, Business and Industry. (C. A. May, 1973) 


43. (a) Explain the significance of statistics in economic analysis. Give 


examples of problems which indicate the importance of Statistics in economics. 
(B.A. Hons. Econ., Delhi, 1974) 
b) Discuss the scope, utility and limitations of Statistics. 
i (M.A. Econ., Meeruth, 1975) 


SECTION 2 
CONDUCTING А STATISTICAL ENQUIRY 


, 1. State the preliminary steps you would take for planning a statistical 
enquiry (B. Com., Mysore, 1966) 
2. Describe the process of planning a statistical enquiry with special 
reference to its scope and purpose, choice between sample and census approaches, 
preparation of questionnaire and accuracy and analysis of data, / " 
(B. Com., Delhi, 1969) 
i В 4 H 
3. What is a statistical enquiry ? Describe the main stages in a statistical 
enquiry. i (B. Com., Poona, 1966) 
4. Explain the various stag:s of statistical enquiry, illustrating your answer- 
with special reference to a statistical enquiry into the health conditions of indus- 
trial workers in the city of Nagpur. (B. Com., Madras, Sept. 1968) 
5. Describe the various steps that are taken in conductiag a survey, 
(C. A. Nov., 1965) 
6. You are required to plan a sample survey to study the problems of 
indebtedness among rural agricultural population in India, Suggest a suitable plan. 
Draft simple questionnaire that may be used in this connection. 
(1.C.W.A., 1966) 
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7. You are asked to determine a suitable standard level of minimum 
wages for industrial labour in India. Indicate fully how you would design a 
simple survey for the purpose. (B. Com., Madras, April 1965) 


8. You are required to collect data on the extent and nature of unemploy- 
ment in urban areas using the sample survey method. How do you proceed T 
(B. Com. Madras, Sept. 1968) 


9. (a) Define a statistical unit. Mention the usual kincs of units employ- 


ed in statistical work, What are the essential points to be observed in the choice 
of a good unit ? 


(b) Giving appropriate reasons, state what units can be used for the 
following cases : 


(i) Production of cotton textile industry, 
(ii) Labour employed in an industry, and 
(iii) Consumption of electricity. 
(C.A., Мау, 1967 ; B. Com., Bangalore, 1968) 
10. What steps would you take in collecting the necessary cata for intro- 


ducing compulsory primary education in the State. (B.Com., Bangalcre, 1969) 
11. What do you mean by a statistical enquiry ? Describe tre steps to. be 
taken in conducting an enquiry. (B. Com., Keraia, 1969) 


12. Outline the various stages of a statistical erquiry ard explain the im- 
portant considerations to be borne in mind while plannirg and conducting such an 


enquiry. (B. A.,Bombay, 1968) 
13. Discuss the important steps that should be considered in planring a 
statistical survey. (B. Com.,Nagps.r, 1970) 
14. How would you organise a survey to assess the nature and extent of 
rural savings in the district of Na. pur ? (М. Com., Nagpur,1971) 
15. Define a statistical unit ard explain what should be the essential recui- 
sites of a good statistical unit. (С.А. May, 1973) 
SECTION 3 


PRIMARY AND SECONDARY DATA 


.l. Discuss the appropriateness of the method of collecting data by 
(i) mailed questionnaire, and (ii) personal interviews. (M,A. Econ., Delhi, 1969) 
у 2. Explain the advantages of direct personal investigation as compared 
with the other methods generally used in EXE. ata: ail v 
(B. Com., Bangalore, 1968) 
3. Compare the different methods used in the collection of statistical 
data. Explain the importance of determining a statistical unit in the collection o 
data, | (B. Com., Poona, 1967 ; М.А. Econ , Meeruth, 1974) 
, , 4. Explain what precautions must Бе {акеп while drafting a question- 
naire in order that it may be really useful. Illustrate your points. 
(С.А. Nov., 1968) 
5. (a) Describe briefly the ‘Questionnaire’ method of collection of primary 
data, stating the essentials of a good questionnaire. (B. Com., Poora, 1965) 
х (b) Describe briefly the other features of a good questionnaire. Drafta 
suitable questionnaire for studying the Five-Year Plans consciousness in your 
area, (B. Com., Andhra,1966) 
‚ 6. “It is never safe to take published statistics at their face value without 
knowing their meaning and limitation." Elucidate this statement by enumerating 
and briefly explaining the various points which you would consider before using 
any published statistics. Illustrate your answer by examples wherever possible. 


Distinguish between primary and secondary data What are the various 
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methods by which pri ta ? (M A. Econ., Jabalpur, 1974 ; 
y which primary data are collected (М A. Econ., Meeruth, 1975) 

.7. As the personnel manager ina particular industry, you are asked to 
determine the effect of increased wages on output in the industry, Draft a suitable 
questionnaire for the purpose (B. Com., Madras, 1965) 


8. You are requested to plan survey with a view to control and abolish 
street begging in Madras City. Outline the main steps you would take and draft a 
suitable schedule to collect the necessary data. (B. Com.,Madras, 1967) 


9. For the preparation of a plan of economic development of a village 
whatdata would you require and how will you collect them ? Comment on the 
sources of data. 

. 10. Explain the main points that you would keep in mind while editing 
primary data. 

11. Distinguish between primary source and secondary source of statistical 
data. What precautions would you take before using data from secondarv source ? 

(С.4., 1965; B. Com., Andhra, 1967) 
‚12. Discuss the validity of the statement: “А secondary source is not as 
reliable as a primary source.” 

13. Define ‘secondary data’. State their chief sources and point out the 
dangers involved in their use and what precautions are necessary before using 
them. (C.A. Nov., 1967) 

14, Describe the primary and secondary methods of collecting data, In 
what special circumstances are the two methods suitable ? 

(B. Com., Andhra, 1968) 

15. Define primary data. State the various methods of collecting primary 
data and discuss their relative merits. $ (C.A. Nov., 1969) 

_ 16. Distinguish between primary and secondary data and discuss the 
various methods of collecting primary data. Indicate the situation in which each 
of these methods should be used. (B, Com., Andhra, 1976 ; B. Com., Punjabi, 1975} 

17. In collection of data commonsense is the chief requisite and experience 
the chief teacher." Discuss, (B. Com., Delhi, 1970) 

18. (a) What are the chief factors to be considered in planning a question- 
naire ? 

(b) Make up а questionnaire from which you would hope to obtain in- 
formation you needed in an investig ition of housing conditions in a small town. 

[B. A. (Hons.) Econ. Delhi, 1971] 

19. (a) Describe the methods generally employed in the collection of 
statistical data stating briefly their merits and demerits. (B. Com., Nagpur, 1971) 

(b) Distinguish between primary and secondary, date Describe the 

i а ollection and mention their merits and demerits. 
various methods of collecti: (B. Com., Delhi, 1975) 

20. Describe the different methods of collecting data indicatirg the merits 
and demerits of each of them. Which method is suitable to the following types 
of enquiries ? 

(i) Enquiry into the food situation by a committee appointed by the 
government. 


(ii) Enquiry by а consumer regarding the commodity requirements of 
its customers. 
(iii) Enquiry by a Research O:gaaisation into the living conditions of 
the Cotton Textile Workers. 
(iv) Study of the socio-cultural aspects of the life in NEFA. 
(B. A. Bombay, 1969) 
21. (a) What do you mean by a questionnaire ? State the essential points 
{о be observed in drafting a good questionnaire. 4 
(b) Describe carefully the merits and defects of the questionnaire 
method of investigation and the precautions necessary in Using this method of 
investigation. (C. A., May, 1971) 
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22. Define secondary data. What are their sources and the precautions 
necessary for using them ? 

23. You have been directed by your employer to carry out a market survey 
to ascertain the probable demand of a new cancer diug. Prepare a suitable 
questionnaire in this connection. State also the types of persons and organisations 


that you may have to contact to ascertain the demand. 
[B. A. (Hons.) Econ., Delhi, 1971} 


24. What do vou mean by secondary data? What are their sources ? 
Explain briefly why should we be careful in using them. (B. A., May, 1973) 


25. Prepare a questionnaire to study the utilization of help rendered bv the 
various agencies in 1972-73 to the drought affected students in Maharashtra State, 
(B. Com., Poona, 1973) 


26. Explain the various methods of collecting statistical data. Which of 
these would you prefer and why ? (B. Com., Punjab, 1974) 
SECTION 4 


SAMPLING AND SAMPLE DESIGNS 


"oL. bs (а) “Sampling is necessary under certain conditions." Explain this 
with illustrative examples. 


Point out tne importance of sampling in solving business and economic 
problems, What are the principles on which sampling methods rest ? M 
(1.0.W.A., 1968) 


я (b) Why is sampling necessary in statistical investigations ? Discuss the 
important methods of sampling commonly used. (M.A. Econ., Jabalpur, 1974) 


р 2. Define clearly the Law of Statistical Regularity and state its application 
in the economic and social spheres. (B. Com., Andhra, 1968) 

3. Explain the terms ‘random sample’, ‘stratified random sample’, and 
“purposive sample’. 


Explain the importance of sampling theory in Economics, 
(M. A. Econ., Delhi, 1968) 


4, State the advantages of adopting sampling procedures in carrying out 
large-scale surveys. (C. A., Nov., 1969) 


5. (a) What is systematic random sampling ? How does it differ from 
purposive sampling ? 
(b) What are tbe advantages of random sampling ? 


(c) How does the size of sample affect ‘Sampling Errors’ ? 
(M. A. Econ., Lucknow, 1969) 


6. Distinguish between the ‘census’ and ‘sampling’ methods of collecting 
data and compare their merits and defects. (B. Com., Osmania, 1967 ; 
B. Com., Mysore, 1969; C. A., 1970) 


7. What is sampling? What precautions would you take in choosing а 


sample ? [B.Com. (Pass), Delhi, 1974} 
.8. Point out the significance of sampling. Distinguish between random 
sampling and deliberate sampling. (B. Com., Delhi, 1967) 


9. What is the utility of sampling in statistics ? Write short notes on Н 


(а) Random Sample. 

(b) Biased Sample. 

(с) Stratified Sample. 

(d) Population or Universe. 
(e) Quota Sampling. 
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10. Explain the importance of sampling. What - 
mecs po pling. are the well-known methods 
An industry is composed of 100 independent organisations. A sample of 
30 was selected composed of 10 «mall. 10 middle-sized and 10 large units. ja this 
sample a satisfactory опе? Give reasons. (I. C.W. A., 1968) 
11. What do vou understand by ‘sampling’? In order to determine a new 
cost of living index it is proposed to make a survey of the income and expenditure 
of 1,000 households in a large city. Describe carefully two methods which might 
be used to s2lect the sample households. 


12. Irtis required to obtain a representative sample of 1,000 
3 ch Зр T j А D reople for an 
investigation into reading habits. Comment i i 
M nt on.the followirg methods for obtain- 
(a) By choosing 1,000 names from the telephone directory. 
(5) By stopping 1,000 people at random outside a main line station. 
(c) By asking 102 librarians to supply 10 names each. 
13. (a) Describe the methods available for collection of statistical а 
: a ata, 
stating advantages and disadvantages of each method. How would you draw a 
random sample from a finite population with random sampling numbers? ^ 
id ў > Д à (I.C.W.A., 1970) 
б (5) Describe the various techniques of sampling known to you and 
diseuss their relative merits and demerits, (B.A. Hons. Econ., Delhi, 1974Y 
14. Classify the methods generally employed in the collection of statistical 
data and state briefly their respective merits aud démérits. (B. Com., Mysore, 1966) 
15) Use the table of random sampling numbers to select ^0 numbers from 
1 to 90. ind the average of sample, and compare it with the average of the num- 
bers 1 to 90, ie.. 455. Repeat the samples of 50, 40, 30, 20 and 10 numbers. 


What conclusions do you draw ? 
16. Distinguish between random sampling and stratified sampling. Suppose 


it is desired to survey gasoline buying habits of car owners in a particular city, 
How would you proceed about it ? Draw up a brief questionnaire for the purpose. 
(I.C.W.A., 1972) 
17. (a) Describe the different methods of sampling known to you. Illus- 

trate your answer with suitable exampels. 
(b) Suppose you are asked to conducta survey of the family expendi- 


tures of the Delhi University teachers. How will you proceed ? 
у (В.А. Hons., Econ., Delhi, 1969) 
18, Point out the differences between a sample survey and a census survey. 
Under what conditions are these undertaken ? Explain the law which forms the 
basis of sampling. (B. Com., Mysore, 1969) 
.19. (a) Distinguish between the "Census" and “Sampling” methods of. 
colleeting data and compare their merits.( B.Com.,M ysore,1966 ;B.Com., Kerala,1968) 
(b) Explain briefly (your answer should not exceed about 300 words) 
whya sample survey is usually preferred to а census survey. Give one example 
of a situation where a census survey is imperative. (C.A. 1974) 
20. You are required to plan a sample survey to study the problems of 


indebtedness among rural agricultural population in India. Suggest a suitable 


plan. 
Draft a simple questionnaire that may be used in this connection. 
JE 4 (LC.W,A., 1973) 


lect data on the extent and nature of unemploy- 
le survey method. How do you proceed ? 


es of errors. A perfect 


21. You are required to col 
ment in urban areas using the samp. 

22. “In any sample survey there are many sourc 
survey is myth.” Discuss the statement. 

23. “Data collected in censuses are automati 
the validity of the statement. 


24. “Of the biased errors, the sta 
unbiased the more the merrier, notwithstan 


Elucidate. 


ically free of errors." Discuss: 


tistician should have попе, but of the 
ding that they are also errors,” 


=" 
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25. (a) Discuss the relative merits and demerits of the census method and 
the sampling method. Describe briefly different sampling methods. 
= (B. Com., Delhi, 1975) 
(b) Distinguish clearly between sampling and non-sampling errors. Is 
it true to say that non-sampling errors do not arise in a sample survey? How 
will you control these errors? 
26. (a) Distinguish between systematic sampling and stratified sampling. 


| (B. Com., Delhi, 1971) 
(b) Discuss briefly the merits and demerits of sample and census 
method of collecting data. (B. Com;, Delhi, 1971) 


27. Define statistical error. How is it different from a mistake ? Write an 


essay on different types of statistical errors and how they are measured, 
(C.A., Nov., 1971) 


28. (a) What are the advantages of sampling over census asa method of 
investigation ? 
(b) What is random sample? Describe any one method of drawing a 
random sample from a population, 
(c) What is stratified sampling ? Comment on the relative advant- 
ages and disadvantages of simple random sampling and stratifjed 
sampling. (B. Com., Bombay, 1972) 
29. What sort of sampling technique would you adopt to study the opinion 
of Delhi University students about Hindi medinm of instructions ? A 
(BJA. (Hons.) Econ., Delhi, 1971] 
30. If you are appointed to investigate the standard of living of industrial 
workers at Nagpur, how will you proceed to do the job ? Give а specimen of the 
questions that you would put; (B. Com., Nagpur, 1972) 
31. Explain the procedure you would adopt in conductinga survey for 
the purpose of collecting data on demand for scooters in Delhi. Also point out 
the various sources of errors in such a Survey and indicate the care vou would 
take in avoiding or reducing these errors, ` (B. A. Hons., Econ., Delhi, 1973) 
32. (a) Describe the situations where you can use : 
(i) Stratified sampling and 
Gi) Systematic sampling. 
(Б) Describe the various typss of non-sampling errors, 
х s (B. Com., Poona, 1973) 
33. Whatarethe main objects of sampling? Compare the merits and 
drawbacks of sample and census enquiries, (М.А. Econ., Meeruth, 1973) 
22.234. Distinguish between census and sample method of data collection. 
Point out the special advantages of sampling technique and also its limitations, if 
any. (В.А, Hons. Econ., Kurukshetra, 1975) 
SECTION 5 
CLASSIFICATION AND TABULATION 
. , M What do you understand by classification and tabulation? Discuss 
their importance, (B. A. Hons., Econ} Delhi, 1971) 


А 7. Exrlain the terms classification and tatulation.' Poit ош their 
importance in a statistica! investigation, What precautions would you take in 


tabulating statistica! data ? (LIC Ю.А , 1969) 
__ 3. Explain the general. principles of classification of data for forming an 
empirical frequency distribution of one variable. (B.Sc., Madras, 1970 ; 


B. Com., Bomtay 1971) 


. .. 4. (a) Define classification and explain the various ways of classification 
adopted in Statistics, Write an illustrative essay on the rules of forming à 
frequercy distribution particularly the choice of a class-interval and the number of 
classes. (С.А. Nov., 1968) 

(Б) What is classification of statistical data? Explain the me!bcd of 
classification by class-intervals with reference to (i) number of classes (ii) length 
of a class interval, and (їйї) class limits. (B. Com., Poona, 1973) 
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5. (a) Whatistabulation ? Point out its significance. 
(b) What is a statistical table ? Distinguish between one-way, two-way 
table and tables of higher order. 
Illustrate your answer with examples. (B. Com., Poona, 1966) 
,, 6. What are the requisites of a good table ? State the rules that serve as 
s a guide in tabulating statistical material. (B.Com. ,Madras,1960 ;B.Com ,Rajasthan,974) 


7. What are the chief functions of tabulation? What precautions would 
you take in tabulating stastistical data ? (B. Com., Mysore, 1968) 


8. Outline the considerations you will bear in mind in the construction of 
a frequency distribution, [B.A. (Hons;) Econ. Delhi, 1969] 


bo collection and tabulation, common-sense is the chief requisite and 
experience the chief teacher."— Bowley. Elucidate. 
10. Distinguish between : 
(i) Simple table and complex table, 
(ii) General purpose and special purpose table. 
(iii) Inclusive and exclusive method of classification. 
(iv) One-way, two-way and higher order tables. 
11. (a) What isa statistical table ? Explain clearly the essentials of a good 
table. (B. Com., Bombay, 1971) 
(b) Sketch the format of a statistical table and label its major func- 
tional parts. 
12. (a) What are the general rules of forming a frequency distribution with 


particular reference to the choice of class interval and number of classes? Ilus- 
trate with examples. (C.A., Мау, 1972) 


(b) Write short notes оп: 
(i) Stubs and captions. 
(ii) Mechanical tabulation. al 
(iii) Manifold tabulation. 
(iv) Class-interval, class limits, class frequency. 
13. Distinguish between 'classification and tabulation’. What precautions 
would you take ia classifving statistical data ? (B. Com., Bombay, 1972) 


14. From the following obser patios prepare a frequency distribution 
table in’ ascending order starting with 100—110 (exclusive method). 


Income in Rs. 


125 108 112 126 110 132 136 130 149 155 
120 13) 136 138 125 111 112 125 140 148 
147 137 145 150 142 135 137 132 165 154 


(B. Com., Karnatak, 1967) 


15. The following are marks scored by 100 examinees in statistics out of 
а maximum of 100. Group them into classes with an interval of 10. 


57 44 8 75 0 18 45 14 0 4 
64 66 72 51 69 34 56 22 34 8 
58 83 20 70 57 28 22 38 5 45 
51 88 17 93 64 36 34 37 58 32 
64 30 80 73 24 46 48 0 16 65 
96 56 20 64 50 63 47 4 32 10 
78 48 55 52 66 8 53 50 0 35 
28 54 38 30 2) 54 52 48 84 50 
94 90 38 84 30 58 20 0 99 42 
79 33 38 60 61 36 10 34 2 80 


(B. Com., Mysore, 1967) 
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16. From the following observations prepare a frequency distribution 
table in ascending order starting with 5—10 (Exclusive Method) : 


Marks in Eaglish ; 


12 36 43 30 28 20 19 10 10 16 
19 27 15 26 20 19 7 45 33 21 
^ 26 37 5 20 il 17 37 30 20 ci 
(B. Com., Bangalore. 98) 
17. Arrange the following data giving the monthly salaries of 40 workers 
in a brewery in the form of a frequency distribution choosing appropriate class 
interval : 


195 87 108 128 65 100 120 150 
202 212 94 103 145 126 95 88 
107 122 93 147 92 117 135 186 
190 101 148 163 172 96 105 132 
143 131 93 86 145 186 109 106 
^ (B.A., Madurai, 1973) 


18, Explain the importance of tabulation in а scheme of investigation. 
Prepare blank table showing the distribution of students of your university 
according to age, sex and class for arranging (a) physical training, and (b) seminar 
classes. (B. Com., Madras, 1966) 


19. Draft а blank table, ready to b: filled showing the distribution of 
students in a university classified according to: (a) Sex ,(b) Faculties [Arts 
Science, Commerce and Law](c) For four years (1951, 1956, 1961 and 1966) ,and 
(d) Age-groups (below 15 years 15—19, 19—21 years, and above). е 

(B. Com., Osmania, 1967) 


20. Presentthe following information in a suitable tabular form, filling 
the gaps: 


“The world consumption of tea has declined from 1,683 units in the year 
1954 to 1,473 units in 1955. 


“The consumption in non-producing countries [) U, К. and (ii) the rest 
fell from 1,180 units to 993 units, In non-producing countries, U. К. showed а 
fall from 546 units to 498 units. 

“Among the producing countries [(/) India (i) Pakistan (iij) Ceylon (iv) 
Indonesia. (v) E. Africa, and (vi) the rest, the consumption in the year 1955 was : 
India 238 units, Pakistan 39 units and Ceylon 21 units. The corresponding figures 
in 1954 were 245, 43 and 25 units respectively. 

‘Indonesia showed a fall of 2 units from 20 units. E, Africa consumed only 
9 units in 1955 which was a fall by a single unit.” (B. Com., Poona 1967) 


21. In the annoal report ofa mobile oi! company it is indicated that the 
company drilled a total of 852 wells in 1957 and 487 in 1958, Two types of drilling 
operations were conducted : wild cat and developmental. lo 1957, а total of 40 
wild cat wells and 842 development wells were drilled, the com parable figures for 
1958 were 46 and 44 . There were 3 possible outcomes when a well was d'iiled : 
oil,gas or dry hole. Of the wild cat wells drilled in 1957, 6 resulted in oil, 4 in 
ваз апа 30 in dry holes, The comparable figures for 1958 were 6,4 and 36. Of 
the developmental wells drilled in 1957, 660 resulted in ой, 77 in gas and 105 in 
dry holes, the comparable figures for 1958 were 333, 77 and 64. 


Present the information in the above paragraph in a formal table giving 
appropriate title. 


22. A particular industrial concern has the following information about 
the employees (i) Age, (ii) Educational level, (ři) Type of work done, (iv) Family 
size, (v) Pay, (vi; Length ol service, and (vii) Marita! status. Indicate tables to 
bring out the relationship among the above characteristics. 

(B. Com., Delhi) 


23. Draft a blank table to show : 


(a) Sex, (b) three ranks—supervisors, assistants and clerks, (c) years 1918 
and 1943, and (d) Age groups—i18 years under, over 18 but less than 55 years, 
over 55 years. (B. Com,, M adras, 1966) 
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(i) Export of cotton piece-goods from India. 
(ii) To Burma, China, Indonesia, Iran, Iraq. 
(iii) Amount of piece-goods to each country. 
(iv) Value of piece-goods to each country. 
(v) Fron 1945-46 —1955-56, year by year. 


(vi) Total amount exported each year. 
(.0.W.A], 1966) 


. 25. Тһе city of Timbaktoo was divided into three areas : the administrative 
district, other urban district, and rural district. A survey of housing conditions. 
was carried out and the following information was gathered : 


Of the buildings in other urban districts 4,06,400 were inhabited and 4,500 were 
under construction, In the Administrative district, 4,000 huildings were unih»bi- 
ted and 500 were under construction of the total of 61,600. The total buildings 
in the c.ty that were under construction were 62,000 and those uninhabited were 


44,400. 
ч Tabulate the above information so as to give the maximum vossible 
information. How many buildings were under construction in rural areas 1 
(B. Com., Mysore, 1968) 
26. Tabulate the following : 
“їп а trip organised by а college, there were 80 persons each of whom paid 
There were 60 students each of whom paid Rs. 16, 
members of the teaching staff were charged at higher rate The number of 
servants was 6 (all male) and they were not charged anything. The number of 


ladies was 20% of the total of which one was a lady staff member. 
(В. Com., Karnatak, 1968) 


ation as possible of 


27. (a) Prepare a blank table fo give as much inform 
ding to sex and four 


the summary results of thé distribution of population accor 
religions at five age groups ia the different States of India. 
y (b) Information is available on the average size of a family for the popula- 
tion of a district stratified according to (i) five occupations, (ii) three educational 
levels, and (ii) rural and urban areas. 
Prepare a suitable table to represent this data. (B. A.,erala, 1969) 
28. Draw up the proforma of a suitable table (complete with title, rulings, 
columnar, headings. sub-headings, Source, note, etc., showing the number О 
students in your University in various classes, classified according to Sex, 
residence, domicile and medium of instructions. (B. Com., Andhra, 1968) 
29. Construct a table in blank in which could be shown, at two differen 
dates and in five industries, the average wages of the four groups, males and 
females, eighteen years and over and under eighteen years, 
30. Draft а blank table to show the following information for the United 
Kingdom to cover the years 1914, 1939, 1949 and 1956. 3 
(a) Population 
(b) Income-tax collected 
(c) Tobacoo duties collected 
(d) Spirits and beer duties collected 


«ори REM also the “рег capita” figures for (b) . 


i lumns (o show 
Arrange fos ае dol isto title. (Institute of Company Accountants) 


d) and (e). Suggest a suital 
(с), (à) (e) S 5 tigate the spendirg of their pocket mon 
: fer girls). For this purpose choose suitable 
СА ecd sic Lobo р jd be analysed and devise a way of tabu- 
lating the information e t to obtain. aU ONG pre 
32. Draw Up 8 table which can be used to display the follow? EA o : 
mation on the length о ice and efficiency ofa set of typewriters. ge o 
machine ; under 1 year, 1 Year and under 2; 2 years and under 3; 3 years апд 
; 
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over. Types of machine : Remington; Underwood; Imperial. Condition of 
machine : Serviceable ; slight repairs ; extensive repairs. 


33 Draw up a table to show the results of last year's college games (win, 
draw, lose), if the boys play football and cricket апа the girls hockey and tennis, 
and there ate three college teams for each game. 


34, Rearrange the following blank table with a view to make it more 
intelligible : 


Brahmins Rajputs | Vaishs Harijans 
m 
2 ^ - ^ ^ h КЧ 
5 Sl у ЗИ ЕЕ ШЕСИ s/s 
ex ki S 3 S s Б s = 
5 S 5 S = S $ 3 
SPA wes ose А Sh S hS 
У = LS] = N S У 5 
" —— — ——— тага тшн 
| 
Males 
Eis de УКТ ИҢЕ ЧОН ЧҮҮ оне нише 
Females 


(B. Com., Allahabad, 1966) 


35, Draw up a blank table showing the exports and imports during the 
years 1960, 1961, 1962, 1963 ard 1964 relating to rhe ports of Bombay, Calcutta, 
Madras and other ports. The table should provide for the values and balance of 
trade and the totals for each year. (C.A., Nov., 1968) 


361 Make a frequenc y table for the following data taking the class limits 
of the exclusive type and a class. interval of 3 units each 4 


15, 17, 23, 14, 13, 19, 15, 17, 15. 12, 16, 18, 21, 15, 20, 12, 9, 14, 17, 
16, 15, 13, 22, 20, 22, 17, 21, 19, 18, 16, 19, 11, 18, 10, 
(B.A., Bombay, 1969) 


.37. The total number of accidents on Southern Railway in 1950 was 3,500 
and it decreased by 300 in 1961, and by 700 in 1962. The total number of accidents 
in metre-gauge section showed a progressive increase from 1960 to 1962. It was 245 
in 1960; 346 in 1951 and 428 in 1962. In the Metre-gauge section, “Not com- 
-pensated"' cases were 59 in 1960, 77 in 1961 and 108 in 1962. **Compensated’’ 
vases іп the broad-gauge section were 2,567, 2,587, and 2,152 in these three years 
respectively, 


From the a-ove report, you are required to prepare a neat table as рег the 
rules of tabulation, (С.А, Nov/, 1971) 


38. (a) Prepare a blank table showing the. particulars relating to students 
of the Bombay University classified according to their age, sex. faculty and three 
important religions. (C.A., Nov., 1969) 


. (b) Explain what is meant by a frequency distribution and point out the 
basic principles to be observed in forming the same. 


‚ €) Form a frequency distribution by taking a suitable clacs-interval for the 
following data giving the age of 52 employees in a governmental agency : 


67. 34, 36, 48, 49, 31, 61, 34, 43, 45, 38 32, 27, 61, 29, 47, 36. 
50, 46, 30, 46, 32, 30, 33. 45, 49. 48, 41, 53, 36. 37, 47, 47, 30, 
46, 50, 28, 35, 35, 38, 36, 46, 43, 34, 62, 69, 50, 24, 44, 43, 60, 
39. (B.A., Madurai, 1970) 


39. Ina sample study about the coffee habits in two towns following 
data were observed : 


Town A 60% people were males, 
40% were coffee drinkers, and 
2672 were male coffee drinkers 


"чора: 
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Town B 55%, people were males, 
3077 were coffee drinkers, and 
20% were male coffee drinkers. 


Tabulate the above observations., (L.C.W.A., 1975) 
40. (a) What are the components of a good table. 


(b) Construct а blank table in which could be shown, at two different dates. 
and in five industries: the average wages of the four groups, males and females, 
eighteen years and over and under eighteen years. Suggest a suitable title. 

© [B.A. (Hons) Econ., Delhi, 1972] 


41. (a) What are the considerations involved in the construction of a 
frequeacy distribution ? 
He (b) If the class mid-points in a frequency distribution of a group of persons 
T 
125, 132, 139, 146, 153, 160, 167, 174, 181 
pounds, find 
(i) The size of the class interval 
(ii) The class boundaries 
(iii) The class boundaries assuming that the weights are measured to the 
nearest pound. (B.A. (Hons.) Econ., Delhi, 1973] 


‚42. Prepare a blank table giving the population of different districts of 
Punjab according to religion and sex. (B. Com., Panjabi, 1975) 
43, Discuss the usefulness of diagrams and graphs in presentation of data. 
(B.A. Hons. Econ., Kurukshetra, 1975) 
SECTION 6 
DIAGRAMMATIC AND GRAPHIC PRESENTATION 
К 1. (a) Discuss the importance and drawbacks of diagrammatic represen- 
tation of data. (B. Com., Bombay, 1968) 
(b) Discuss the usefulness of diagrammatic representation of facts, 
(B. Com., Delhi, 1973) 


2-м The merits of diagrammatic presentation of data are classified under 
three main headings ; attraction, effective impression and comparison. Explain 
and illustrate these points. (B. Com., Mysore, 1966) 


3. What are the merits and demerits of diagrammatic representation of 

statistical data ? Write short notes on any three important methods used for 

diagrammatic representation. (В. Com., Madras, 1973) 
4. (a) What, in your opinion, are the tests of a good diagram ? 

(B. Com., Bangalore, 1968) 


(b) Discuss the merits and demerits of diagrammatic representation of 


statistical data. (B. Com., Poona, 1973) 
E What considerations must generally be borne in mind while presenting 
statistical data ? (B.A. Hons. Econ., Delhi, 1970) 


6. (a) Cbarts or graphs are тоге effective in attracting attention than 
are any of the other methods of presenting data. Do you agree ? Give reasons 
and illustrations. (B. Com., Punjab, 1969 
(b) “Diagrams help us visualize the whole meaning of a numerica 
complex at a single glance." Comment. 
What points should be taken into consideration while presenting а tabl 
diagrammatically ? (B. Com., Punjab, 1970 
7. “Diagrams do not add anything to the meaning of statistics but whe 
drawn and studied intelligently, they bring to view the salient characteristics ¢ 
graphs and series," Discuss this statement describing briefly the various types € 
diagrams. (M. A. Econ., Meerut, 197: 
8. How would you illustrate by diagram the following kinds of numeric 
data? Give brief reasons for your choice. 
(a) Average yield per acre of rice in India for each of the last 15 yea: 
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(b) The harvest prices of the main cereal crops—rice, wheat, jowar, 
bajra, barley and maize—in India for the year 1968. 


(c) The number of deaths in road accidents in a city for each month 
of the last two years. 


(d) The proportion of time spent in a school on each of the subjects 
taught. (M. A. Econ., Meerut, 1973) 


9.. State briefly, giving reasons, the kind of diagram you consider most 
appropriate for use with each of the following data : 


(a) Number of children per family in a large town. 

(b) Monthly rainfall for a period of three years. 

(c) Monthly output of steel for one year according to the principal 
grades of quality, 4B. Com., Poona, 1970) 


10. “Diagrams help us to visualize the whole meaning of a numerical com- 
plex at a single glance,” Explain, what, in your opinion, are the tests of a good 


diagram. i 3 (B. A., Bombay, 1969) 

1l. (a) Si um important rules to be followed in constructing a suitable 
graph. 

(b) What is the false base line? Under what conditions would its use 

be desirable ? (B. Com., Mysore, 1967) 


12. Explain with the help of sketches the construction of the following : 
(а) Bar diagram 
(b) Histogram 
(c) Frequency polygon 
(d) Circular diagram. (B. Com., Mysore, 1967) 
13. Differentiate between the natural scale and logarithmic scale used in 
graphic presentation of data, 1n which cases should the latter scale be used. 


Also explain the concept of false base line and the circumstances in which 


it should be used. (B. Com., Kerala, 1968) 
14, (a) Explain what is meant by a semi-logarithmic diagram and discuss 
its advantages of the natural scale diagram. (I.C.W.A., 1966) 


(b) What do you mean by a cumulative frequency distribution ? Point out 
its special advantages and uses, 


15. (a) What are 'less than’ and ‘more than’ curves ? What purpose do 


they serve ? (B. Com., Bombay, 1967) 
(b) Give an account of various Charts that may be used be conveying 
statistical information. (M.B.A., Delhi, 1969) 


(c) Pie diagrams, Tectangles, bars and graphs—all serve the same 
purpose and can be interchanged in use, Discuss. 
16. Pointout the usefulness of di Е Дуе. 1) 
. 16. ion or oneness of diagrammatic representation of facts and 
explain the construction of any of the different forms of Чаңгы» you know 
* B.Com., Panjabi, 1975) 


17. Represent the followi i i i i 
Mec DENN Oe го LAE data by a suitable diagram showing the diffe- 


PROCEEDS AND COSTS OF A FIRM 


(in thousands of | rupees) 
Year — Total Total Costs Year 
Total Total Costs 
Proceeds Proceeds 
1960 22-0 19°5 1963 3 256 
1961 273 217 1964 $3 261 
1962 — 282 3070 1965 333 342 


(1.C.W.A.,1966) 
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18. The following table gives the average approximate yield of rice in Ib. 
per acre in various countries of the world in 1968-69 : 


Country Yield in Ib. Country Yield in lb. 

per acre per acre 
India 728 Italy 2,903 
Thailand 943 Egypt 2,153 
U.S.A. 1,469 Japan 2,276 


Indicate this by a suitable diagram which will highlight the relative back- 
wardness of ladia in tnis regard. 
(1.C. W.A., 1969) 


19. Illustrate by a suitable diagram the following data of expenditure of 
an average working class family : 


Item of Expenditure Per cent of Total 
Expenditure 
Food 65 
Clothing 10 
Housing 12 
Fuel and lighting 5 


Miscellaneous 8 
(1.0.W. A., 1968) 


4 20, Represent the following data pertaining to Indian Railways by a 
suitable sub-divided bar diagram ; 
(In crores of rupees) 
1958-59 1959-60 1960-61 


1. Gross Income 390 422 468 
2. Gross Expenditure 331 353 389 
3, NetIncome 59 69 79 


(B. Com., Lucknow, 1969) 

21. The following table gives the distribution of outlay in the first and 

second Five-Year Plans of India under major heads of development expenditure. 
Represent these figures by suitable diagrams : 


(in crores of rupees) 
Head of Expenditure of First Plan Second Plan 

1 Agriculture and Community 
Development 357 268 
2. Irrigation and Power 661 913 
3. Industry and Mining 179 890 
4, Transport and Communication 557 1,385 
5. Social Services 533 945 
6. Miscellaneous 60 99 
Total 2,356 4.500 


(B. Com., Lucknow, 1969) 
22. Represent the following data by means of a suitable diagram : 


NUMBER EMPLOYED 
Year Men Women Children Totol 
70,000 3,60,000 
1960 1.50000 1,60,000 7,20,000 


1960 3,50,000 
(B. Com., Agra, 1968) 
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23. The budgets of two families are given below. Represent the data by а 
percentage diagram. 


Items of Expenditure Family A Family B 
Food 160 120 
Clothing 80 32 
Rent 60 48 
Light and Fuel г 20 16 
Miscellaneous 80 24 

400 240 


(B. Com., Mysore, 1966) 


24. The following table shows the monthly expenditure of three families. 
Represent the data by asuitable diagram оп percentage basis and write a note 


on it. 


families : 


Items of Expenditure Family A Family B Family C 
Food articles 43 83 120 
Clothing 8 17 25 
Recreation 3 10 12 
Education $ 9 15 
Rent 10 21 17 
Miscellaneous 6 15 17 


(B.A. Madras, September 1968) 
The following table gives the details of monthly expenditure of two 


Items of Expenditure Family A Family B 
Food 30 90 
Clothing 7 35 
House Kent 8 40 
Education 3 12 
Litigation 5 40 
Conventional necessity 3 60 
Miscellaneous 4 23 


Represent the above figures by a suitable diagram. 


(B. Com., Nagpur, 1967) 


26. Draw a suitable diagram to represent the following information : 


Ttems Family A Family B 
Food. 120 150 
Clothing 80 150 
Rent 40 100 
Other expenses 160 200 
Total “400 ‘600 


(B. Com., Marathwada, 1969) 


27. Represent the following data by a “Pie Diagram”, 
Cheques cleared in India in clearing houses in the year 1960 and 1965 


Centres Amount in crore of rupees 
1960 1965 

Bombay 829 670 
Calcutta 1,070 by 
Madras 108 274 
Other centres 313 615 
Total 2,320 6,002 


(B. Com., Bangalore, April, 1969) 
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28. Explain (а bar-chart, and (ii) a pie-diagram. 
Draw a bar-chart for the following data : 
Percentage of total popuiation 


Villages Towns 
Infant and young childern 13 12:9 
Boys and girls 2571 232 
Young men and women 32:3 365 
Middle aged men and women zu 20 


Elderly persons 
(B. Com., Madras, 1969) 


29. la) What are the merits and limitations of diagrammatic representa- 
tion of statistical data ? Write short notes on any three important methods used 
for diagrammatic representation. (B. Com., Madras, 1966) 

(b) The following table gives the production of paper (to the nearest 
1,000 tonnes) in India for certain years. Draw abar-diagram to bring out the 


different components. 
Production nearest *000 tonens 


1971 1972 1973 
Printing and writing 65 70 103 
Wrapping 16 15 24 
Special varieties 7 5 
Boards 18 19 13 
Total 106 109 145 
30. Draw a rectangular diagram to represent the following information ; 
Factory А Factory B 
Price per unit of a commodity Rs. 6 Rs. 6. 
Quantity produced 1,000 units g00 units 
Value of raw materials used Rs 3,000 Rs. 2 400 
Other expenses of production Rs. 2,000 Rs 1,400 
Prefits Rs. 1,000 Rs, 1,000 
(B. Com., Karnatak, 1968) 


Ay 31. Thefollowing table gives the Iadex Number of wholesale prices in 
ndia : 


Years Food Articles муа. Articles 
1968 306 286 

196) 383 346 

1970 391 347 

171 416 354 

1972 399 401 


Plot the given data on a graph paper. 
(B. Com., Osmania) 


2. Draw the histogram, frequency curve and the ogive curve for the 
following data : 


Class interval Frequency Class interval Frequency 
90—100 10 140 150 320 
100—110 35 150—160 200 
110—120 140 160—170 75 
120—130 300 170 180 45 
130—140 370 180—190 15 


(B. Com., Poona, 1965) 


33. What is meant bya histogram ? State briefly how it is constructed, 
Indicate clearly how the histogram in respect of the following datacan be drawn 
(only a rough sketch is required). State also how youcan draw histograms in 
respect of unequal intervals. 
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Mid-value Frequency Mid-value Frequency 
115 6 165 60 
125 25 175 38 
135 48 185 22 
145 72 195 3 
155 116 


(C A., Nov., 1966) 


34. The following table gives the distribution of monthly income of 700 
middle class families in a certain city : 


Monthly income Frequency Monthly income Frequency 
(in Rs.) (in Rs.) 

Below 100 120 400—500 70 
100—200 135 520—600 25 
200—300 150 600 and over 20 
300— 400 180 


Draw an ogive for the above data, Obtain the limits of income of 
central 50% of the observed families. 


35. Represent the following data by means of a histogram : 


Weekly wages No. of workers Weekly wages No. of workers 


(in Rs.) (in Rs.) 

10—15 7 30—40 12 
15—20 19 40—50 12 
20—25 27 60—80 8 
25—30 15 


(C.A., May, 1968) 


36. The following table gives the hourly wages and number of persons 
receiving the wages in a company : 


Hourly wages No. of Hourly wages No. of 


in cents persons in cents persons 
(x) (f) (x) u 
0—10 18 60—70 177 
10—20 196 70—80 52 
20—30 438 80—90 17 
39—40 844 90—100 3 
4) S0 744 100—110 1 
50—60 440 
Draw on graph paper (i) the histogram, and (ii) the frequency polygon for 
the distribution. (1. A.S., 1965) 


37. The following table relates to rupee loans and small savings in India 
during 1951-70: 


Year Loans Saving: Year Loans Savings 
(Rs. Lakhs) (Rs. Lakhs) (Rs. Lakhs) Rs. Lakhs) 

1961 200 100 1966 220 95 

1962 222 95 1967 226 100 

1963 240 105 1968 240 #0 

1964 220 100 1969 236 75 

1965 212 - 90 1970 244 70 


Represent the above data by means of an index histogram. 
(B. Com., Andhra, 1971) 


38. Followinz are the marks (out of 100) obtained by 50 students in 
Statistics : 
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ТОБЕ 25100542: 57 45 
5411691539 ee c8 4 cep msg? 61-1 65 42 
50. 52. (Stu #АП 45" (55. i36 59 $63. 39 
65 КИЗ Майн ү ТОА IS 102. AN 7 5277735 
40 735 ПЫ) 26920140 55 4618 


(i) Make a frequency distribution taking a class-interval of 10 marks. 
(Take the first class-interval as 0—10.) 


a. (ii) Draw a histogram and a frequency polygon from the frequency 
distribution. (B. Com., Deihi, 1968) 


39. The following table gives the price of wheat and cotton in rupees 
(for 5 kg.) : 


Years Wheat Cotton Year Wheat Cotton 
1968 Tl 341 1972 84 348 
1969 8°5 39'8 1973 82 32:9 
1970 8'6 373 1974 8.8 33:2 
1971 87 333 


Compare the fluctuations of prices of the two commodities by drawing 
graphs on the same paper : 


(a) with the data as they are, 


b) with the data as they are, but with the scale for cotton equal to one- 
fifth that for wheat, and 


(c) with the price-relatives taking 1968 as base. (I.C.W.A., 1967). 
40. The population of a city during the last forty years is given below : 
1931 40,312 1961 68,226 
1941 47,426 1971 107,336 
1951 49,326 


Estimate with the help of a graph the population of the city in 1966. 
(B. Com., Andhra, 1972) 


41. The following table gives the total units produced at the beginning of 
different years. Represent the data graphically and estimate the mid-year yalue 
for 1959 and 1963. 


Year Units produced Years Units produced 
1957 20 1962 811 

1958 62 1963 1,104 

1959 147 1964 1,425 

1960 300 1965 1,755 

1961 536 


(I.C.W.A., 1968) 


42. Represent the data showing the number of companies in various ranges 
of subscribed capital (obtained from the National Council of Applied Economic 
Research, New Delhi) by means of a histogram : 


Range No. of Range No. of 
(Rs. Lakhs) Companies (Rs. Lakhs) Companies 
Up to 10 10 50—80 7 
10—20 12 80—100 8 
20—30 10 Above 100 5 


30—50 14 
(B. Com., Andhra, 1968) 


43. Draw histogram, frequency polygon and ogive for the following data : 
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Marks Frequency Marks Frequency 
0—10 4 40—50 20 
10—20 10 50—60 . 18 
20—30 16 60—70 8 
30—40 22 70—80 2 


(B. Com., Poona, 1969) 


. ‚44. Represent the following information and also the cost and profit per 
unit diagrammatically : 


Factory Wages Material Other Profit No. of 


costs Units 

Rs. Rs. Rs. Rs, Produced 
A 6 000 10,000 2,000 2.000 2,000 
B 4,000 6,000 1,660 1,000 1,400 


(B. Com., Alicarh, 1968) 


45. A rupee spent on khadi is distributed as follows : 


Paise 

Farmer 19 
Carder and Spinner 35 
Weaver 28 
Washerman, Dyer and 
Printer 8 
Administrative Agency 10 

Total 100 


Present this information by a suitable diagram. 
(B. Com., Delhi, 1969 ; 


B Com.. Bombay, 1970) 
(Hint : Pie diagram) 


bé e Plot the following figures on a graph paper and show also the balance 
1967—68 1968—69 
Мотһ Imports Exports Imports Exports 
(Та crores) (n crores) Un ctores) {In crores) 

April 22 28 2 18 

ау 24 28 21 20 
June 26 26 19 17 
July 28 21 18 17 
August ЗЕ 20 21 20 
ERAI 29 22 20 20 
Ne 32 21 23 18 

By 32 19 26 20 
es 32 20 23 22 
Jan 31 19 28 23 
Feb, 25 18 20 22 
March 24 19 21 28 


(B.A., Pilani, 1969) 

41. State briefly, giving reasons, the types of diagrams consider-d mis- 
appropriate to use in dealing with each of the following types of information : 
+ Monthly output of eggs divided into four grades, ` E 


2. Number of chiidren rer family in Meerut city. 
3. Monthly production of pig iron frcm 1964 to 1968, 
4. Monthly rainfall for five years, (М.А. Econ. Meerut, 1970) 


ees ме 
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-48. Draw а ‘Less than ogive’ for the following monthly income distribution 
of 600 middle class families in a certain city : 


Monthly income Frequency 

(Rs.) 

Below 75 69 
75—150 167 
150—225 206 

225—300 65 

300—375 58 

375—450 25 

450 & over 10 


(a) Find the limits of income of the central 50% of the families 


(b) If a tax is to the collected from those families having income exceeding 
Rs, 350 what percentage of families will be asked to pay the tax ? 
(B. Com. Bombay, 1970) 


49. The following table shows the areas in millions of square kms of 
oceans of the world ; 


Ocean Area (million sq. kms.) 
Pacific 70'8 
Atlantic 412 
Indian 28:5 
Antarctic 76 
Arctic 48 


Which chart would you choose to graph the data—a pie or a bar chart ? 
(B A. Hons. Econ.,Delhi, 1971) 
(Hint : Pie diagram would be better) 
50. The following cata give the number of lottery tickets (in thousands) 


sold in the months of October, November and December 1972 at Bombay, Poona, 
Nagpur & Sholapur: 


Cities | | 
Bombay Poona | Nagpur Sholapur 
Months | 
October ASD 1 Е 05105 230 75 
November 560 154 320 120 
December 490 | 115 210 60 


uitable diagram. 
Represent the above data by a suita gi (B. Com. Poona, 1973) 


51 repare a frequenly distribution for the following figures giving yields 
ofa certain crop per hectare for 50 crop cutting experiments : 


Yield in kg. 


531 523 58 51 540 532 550 563 
E BH 527 534 50 555 50 55 532 569 
$05 575 55 550 556 543 59 548 530 537 
347 557 547 536 54] 558 59 555 538 531 
330 545 53 548 532 58 549 529 531 535 
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Take classes as 510—520, 520—530 .. and ie m Plota histogram and 
c istributi obtained, 
frequency polygon of the ftequency distribution so (в. Com.,Poona, 1913) 


52. The following table gives the distribution of 160 workers in a factory : 


Wages (Rs.) No. of workers 
More than 80 160 
» o» 90 151 
Ж тоо 134 
deal Р 104 
SER DOO 60 
gn) vss; 130. 29 
И 7 10 
$^». 150 0 


Draw on *ogive curve' from the above data and determine : 
(i) The number of workers earning less than Rs. 135 ; 
(ii) The number of workers earning at least Rs, 118 ; and 


(iii) The median wage of the workers. 
! f i (B. Com.,Poona, 1973) 


+53. Represent the following information and also the cost profit per unit 
diagrammatically : 


Factory Wages Materials Other Profit No. of 
Costs Units 
produced 
А 3,000 5,000 1,000 1,000 1,000 
в 2,000 3,000 800 500 700 


(B. Com., Agra, 1972) 


54. Given below are the marks obtained by 60 students of a class ina 
certain subject. Represent this information graphically and find the median 
marks from the graph > 


Marks out of 100 No. of Students 
Less than 10 4 
» » 20 8 
» 330 25 
» n»n 40 14 
е js 20 41 
» » 60 46 
» » 70 50 
» з 100 


60 
(В. Сот., Кај, 1973) 
55. (а) Discuss the usefulness of diagrammatic representation of facts. 
(6) Draw a suitable diagram to represent the following information : 


Selling ty. We 7 i 
UN g " ‘ages Materials Misc, Total 
Factory X 400 20 3,200 2,490 1,600 7 
» Y 600 30 6,000 6,000 9,000 21000 


Show also the profit or loss as the case may be. 
(B. Com., Delhi, 1974) 
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56. Represent the following by sub-divided bars drawn on a percentage 


basis : 

Particulars 1968 1969 1970 

Cost per chair "(Rs.) (Rs.) (Rs.) 

1. Wage 4'50 7:50 10°50 

2. Material 3:00 510 7:00 

3. Polishing p50 240 3:50 

Total cost . 9:00 15°00 2100 

Sale per chair 10°00 15:00 20:00 

Profit (4-) Loss (—) 4-100 —1°00 


(B. Com , Rajasthan, 1974) 


57. The following data give the number of lottery tickets (in thousands) 
sold in the months of October, November and December 1972 at Bombay, Poona, 
Nagpur and Sholapur : 


Cities I Bombay Poona Nagpur Sholapu r 
Months | 
October | 450 125 230 75 
November | 560 154 320 120 
December | 490 115 ZO see | 60 


Represent the above data by a suitable diagram. 
(B. A., Bombay, 1973) 


58. Represent the following data by a suitable diagram showing the 
difference between the sale price and the cost price : 


Year Sale Price (Rs.) Cost Price (Rs.) 
1969 270 19:5 
í 1970 273 217 
1971 282 300 
1972 303 256 
1973 327 261 
1974 333 


342 
(М.А. Econ., Meeruth, 1974) 


.59 Тһе following table gives the details of the cost of construction of a 
house in Agra : 


Rs. Rs. 
Land 4,500 Cement 800 
Labour 2,500 Lime 800 
Bricks 2,000 Stone 600 
Tron 1,800 Sand 200 
Timber 1,500 Other things 1,300 


Represent the above figures by a suitable diagram. 
(В. Com , Kurukshetra, 1974) 


(Hint. Convert the data into percentage of total and use pie diagram.) 
SECTION 7 


MEASURES OF CENTRAL VALUE 


i istical average ? What are the desirable properties for 

an bes big e + Mention different types oh еее and state why the 
ithmeti is the most commonly used amongst them. 
Mtas cam (B. Com., Madras, 1966) 
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2. What is meant by measures of central tendency ? 


(М.А. Econ., Meeruth, 1975) 


3. What do you understand by ‘Central en a Under what condi- 
ions i i i than ot ures of central tendency ? 
tions is median more suitable than other measur (B Com) Delhi, 1975) 


4. What do you understand by an average in Statistics? Describe briefly 
the characteristics and uses of most important averages. (C.A,, 1971) 


5. “Average is a number indicating the bet rn ога sraup of obser- 
ions,” isi ian and mode ? Give illustrations. 
vations.” How far is it true for mean, median an (Б) Сот, Bombay, 1969) 


6. What is the difference between simple and weighted average ? Explain 


the circumstances under which the latter should be used in preference to the 
former, (C.A., 1972) 


7. (a) Define the different measures of central tendency explaining how 
each one of them can be computed for a given frequency distribution. 


> j (B.A. Hons., Econ., Delhi, 1969) 
Д (b) ‘Ап average is a substitute for a complex group of variables but it 
18 not always safe to depend on the substitute alone 10 the exclusion of individual 
member of the group." Discuss. (B Com, Punjab, 1974) 
8. How would you account for the predominant choice of arithmetic mean 
of statistical data as a measure of central tendency ? Under what circumstances 
would it be appropriate to use mode or median? (B.A, Hons. Econ , Delhi, 1974) 
,9. (а) State the empirical relationship between mean, median and mode 

for unimodal frequency curves which are moderately asymmetrical. 


1 (M. Com., Delhi, 1969) 
(b) What are the properties of a good average? 
10. (a) What are desiderata for a satisfactory average ? Examine the 
geometric mean in the light of these desiderata and bring out the specia! proper- 
the construction of index numbers. (M. Com., Delhi, 1966) 


; (Б) What are desiderata of a good average ? Compare the mean, the 
median and the mode in the light of these desiderata. Why are averages called 
measures of central tendency ? (B. Com., Bombay, 1970) 


11. In each of foilowing cases, explain whether the description applies 
to mean, median or both : 


(i) Can be calculated from a frequency distribution with open-end 
classes, 


(ii) d values of all items are taken into consideration in the calcu- 
aton. 


(iii) The values of extreme items do not influence the average. 


(iv) In a distribution with a single peak and moderate skewness to tbe 
right it is closer to ihe опсепіга(іоп of the distribution. 

з (C.A., Nov., 1969) 

12. Each type of average has its particular field of usefuliness. Examine 

mean, median and mode in the light of this Statement. (В. Сот., Bombay, 1972) 

13. (a) What are the criteria of a satisfactory measure of centra 
tendency ? 

(bj Discuss the standard measures and say which of these satisfy your 

criteria. (M. Com., Delhi, 1970) 

(c) What are the measures of central value or tendency ? Describe their 

characteristics and state what Considerations determine the use of particular 

measures, Ё. (B.A., Bombay, 1970) 
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14. (a) 11 is said that the choice of an average depends on the particular pro- 
blem on hand. Comment on this and include at least one instance of the use of 
median, mode, geometiic and harmonic mean. (С.А., Nov. 1972) 


(b) What claims have the median and mode for use as measures of central[ 
tendency ?- Why are measures of central tendency not sufficient to conipare two 


more frequency distributions. (B.A. Hons., Econ., Delhi, 1973) 
(c) Discuss the relative merits of the median and the arithmetic mean as 
measures of central tendency. (В.А. Hons. Econ., Delhi, 1975) 


[Average bonus paid—Rs. 19:5) 


15. The following table gives the monthly income of twelve families in a 


town : 
S. №. Monthly Income S. No. Monthly Income 

{ Rs. Rs. 

1 280 7 80 

2 180 8 84 

3 96 9 100 

4 98 10 75 

5 104 1! 600 

6 75 12 200 


Calculate the arithmetic average, the median and the mode of the above 
incomes. Which average would represent the above series best ? 

_ ( M.A.. Econ., Lucknow, 1967) 

LX—164:33, Med.—99, Mo.—75] 


16. The following are the monthly salaries in rupees of 20 employees of a 
firm : 


130 62 145 .— 118.22 125^. 76. ASI 142 110 98 
65 16 100 103 71 85 80 122 132 . 95 


The firm gives bonuses of Rs. i0, 15, 20, 25 and 30 for individuals in the 
respéctive salary groups exceeding Rs. 60 but not exceeding Rs. 80, exceeding 
Rs. 80 but not exceeding Rs 100, end so оп up to exceeding Rs. 140 but not ex- 
ceeding Rs. 160. Find the average bonus paid per employee. (C.A. Nov., 1968) 

[Average bonus paid — Ns. 195] 


17. (a) Compute the mean, median and mode for the following distri- 
bution : 


Height of women Мо. of women Height of women No. of women 
(in inches) (in inches) 


60 27 64 210 
61 146 65 - 128 
2 435 66 98 
63 398 (B. Com., Andhra, 1967) 


[X 62:97, Med.—63, Mo.=63] 


b) Calculate the mode, median and arithmetic average from the following 
frequency distribution of marks at a test in Statistics : 
Marks : DO OR ES): 52020825 Э: ООА TA 


No. of ED UN e ea EM e ME TN 
(B. Com., Nagpur, 1971 
(Mo. 20, Меа:—20, X 221 
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18. Calculate the mean from the following frequency table : 
Mid-points 1 2 3 


4 5 6 7 8 9 
Frequency 2 60 101 152 205 155 79 40 1 


(1.C.W.A., 1968) 
Ly—487] 
19. For the following frequency table calculate mean, median and mode: 


Monthly rent No. of families Monthly rent No. of families 
Gn Rs.) paying the rent (їп Rs.) Paying the rent 
20—40 6 120—140 15 
40 —60 9 140—160 10 
60—80 п 160 –180 8 
80—100 14 180—200 7 

100—120 20 


(B. Com., Delhi, 1967) 


LY—110, Med —110, Mo. = 110-91) 
20. Calculate the value of : 


(a) the median 
(b) the mode, and 


(c) the two quartiles from the following data : 


Age No. of persons Age No. of persons 
20—25 50 40—45 150 
25—30 70. 45—50 120 
30—35 100 50—55 70 
35—40 180 55 60 60 


(IC. W.A., 1966) 
[Med.—40, Q1—34, Qs—47'08,. Mode 38:64] 
For the frequency distribution given below : 


(i) Draw a histogram and a smooth frequency curve on a graph. 
paper. 


21. 


(ii) Calculate the mode, 


(x) 1—9  9—17 17—25 23—33 
31 27 15 


33—41 41—49 49—65 
() 20 10 7 8 


(M. Com., Delhi, 1968). 

ub] 

; 22. Drawa Percentage ogive and obtain ihe median weight from the data. 
given below : 


Weight in pounds Frequency 


1 Weight in pounds Frequency 
118—126 3 


154—162 5 
127—135 5 163—171 4 
136—144 9 172—180 2 
145—153 12 


Ё (М. А.. Econ., Delhi, 1968) 
23. Calculate the simple and weighted ai ti А ton of 
coal purchased by an industry for the halt = ee ic mean price per 
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Month Price perton Tons purchased Month Price per ton Tons purchased 


Rs. Rs 
Jan. 42:50 25 April 52:00 50 
Feb. 51°25 30 Мау 4425 10 
March 50°00 40 June 54:00 45 


Account for the difference between the two. 
(C. A., Nov.| 1966) 
[X=49, Xw=50' 363) 
‚24. Find graphically the median value from the following data on yield 
of grain in pounds per 1/500 acres : 


Yield No. of plots Yield No. of plots 

27-29 4 41-43 69 

29—31 15 43—45 59 

31—33 20 45—471 35 

33-35 47 47—49 10 

3:5-37 63 49—51 8 

3:7- 3:9 78 51-53 4 

39-41 88 

Determine the modal value from its approximate relationship with mean 

and median. (LA.S., 1966) 

[Mode=3 Median 2 Mean 
Mode--3:96] 


25. The following table gives the distribution of monthly income of 600 
middle-class families in a certain city : 


Monthly income Frequency Monthly income Frequency 


(in Rs.) (in Rs.) 
Below 75 69 300—375 ` 58 
75—150 167 375—450 24 
150—225 207 450 апа оуег 10 
225—300 65 


(a) Draw an ogive for the above data and thence obtain the median value, 
Check it against calculated value. 
(b) Obtain the limits of income of central 50 per cent of the observed 
families. [(a) Median — Rs. 1732, (b) 1114—23371] 
26. Define: (a) the geometric mean, and 
(b) harmonic mean. 
Calculate the geometric mean of the following price relatives : 


Commodity Price Relative Commodity Price Relative 


Wheat 207 Sugar 124 
Rice 198 Salt 107 
Pulses 156 Oils 196 
(B. Com., Andhra, 1968) 
[С. М.=159'8] 


27. (a) Writea note on the merits and demerits of geometric mean. 
(b) Calculate geometric mean from the following data : 


65 1690 110 1125 142 155 355 2150 


(B. Com., Andhra, 1966) 
[G.M.=42°74} 


R-28 STATISTICAL METHODS 


28. The following table gives the distribution of 100. accidents during 
Seven days of the week of a given month, During the particular month there were 


Day No. of accidents Day No. of accidents 
Sunday 26 Thursday 5 . 8 
Monday 16 Friday ' 10 
Tuesday 12 Saturday * 18 
Wednesday 10 


(C. A., 1968) 

[Number of accidents per day —14] 

я - At harvesting time a farmer employed 10 men, 20 women and 16 boys 

to lift potatoes. The women’s work was three-quarters as effective as that of man, 

while a boy's work was only half. Find the daily wage billif a man’s rate 

was 24 shillings a dav and the rates for the women and boys in proportion to their 
effectiveness, Calculate the average daily rate for the 46 workers. 


(Union of Lancashire and Cheshire Institute) 
[Average daily rate=17'2 shillings) 


‚30. (а) When will you choose mode as a measure of central tendency as 
against the arithmetic mean? Illustrate with a suitable example, 


(6) There are two sections in a class of students. Section А has 30 
students and the average mark obtained is 50. Section В has 40 students and 
the average mark is 45. What is the average mark of the class as a whole ? 


[Х\а=47] 


(c) Given the Population figures for a country for 1960 and 1970, how will 
you proceed to calculate the average rate at which Population changed between 
1960 and 1970 ? Explain briefly, (B. A. Hons., Econ., Delhi, 1971) 


31. Oat of the'total population in a certain town in South Africa, 60% 
belonged to the Black Race and the rest belonged to the White Race It was 
estimated that their mean incomes were respectively 2,000 and 5,000 pounds. Find 
the average income of the entire town, (C. A. May, 1968) 

C¥i2=3,200) 


_ 32. The following are the marks obtained by a batch of 20 students ina 
certain class-test in Eaglish and Mathematics : 


Roll No. ИМИ ы ЖӨЕ Г EN 9 0 
Marks in English 53 — 54 52 32 30 60 47 46 35 28 
Marksin Maths. 58 55 25 32 26 85 44 s0 33 72 
Roll No. Р lt 71277733 УВА ЧӨ ао Г УУ т) 
Магкѕ in English 25 r 043 48 72 51 45 33 65 29 
Marks in Maths. — 10 42 js AB, p00 64 39. 38 |390 3g 
In which Subject is the level of knowledge of the students higher ? 
(B Com., Gorakhpur, 1969) 
Med. Marks in English=45-5 
Med. Marks in Maths.—4] 5 } 
Level of knowledge in English is higher. 


33. The following table gives the age distribution of males in an Indian 
*own, Find the mean age and the median age. 


Age group Males Age group Males 
0—9 2,756 50—59 610 
10—19 2,124 60—69 245 
20—29 1,677 70—79 67 
30—39 1481. 80289 6 
40—49 1.021 90—99 3 


(B. Com., Madras, 1969) 
{ #=23°65, Med.—20:22) 


— M Ó: 
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34. The table gives the frequency distribution of marks cbíained by 199 
Students in an examination. Find out the median and the percentage of failures 
if the minimum marks for a pass is 35. 


Marks No. of students Marks No. of students 
0-20 21 51—60 24 

21—30 19 61—70 18 

31—40 60 71—80 15 

41—50 42 


(B. Com., Madras, 1967) 
(Med.=40'5, Failures=35'2%,} 


35. Find the median and mode from the following table : 


No. of days absent No. of students Мо. of days absent No. of students 


Less than 5 29 Less than 30 644 
»  »10 224 » n 35 650 
» 99 15 465 » » 40 653 
»  »20 582 » » 45 655 
» »25 634 


(C.A. May, 1967, ; B. Com., Lucknow, 1968). 

UMed.— 12:14, Mode=11'35} 

36. The following table gives the weekly wages in rupees in a certain. 
commercial organisation : 


Weekly wages (Rs.) 30— 32— 34— 36— 38— 40— 
Frequency 2 9 25 30 49 62 
Weekly wages (Rs.) 42— 44— 46— 48—50 

Frequency 39 20 1 3 

Calculate from the above data : 

(i) the median and the third quartile wages, 


(ii) the number of wage-earners receiving between Rs.37 and Rs. 45 
forty-five per week. (1.C.W.A,, 1968) 


[(i) Med.— 4032, Q3— 42:54, (ii) 175) 
37. From the following table showing the wage distribution ina certain 
factory, determine : 

(a) the mean wage, 

(b) the median wage, 

(c) the modal wage, 

(d) the wage limits for the middle 50% of the wage earners, 

(е) the percentage of workers who earned between Rs. 75 and Rs. 125, 

(f) the percentage who earned more than Rs. 150 per week, and 

(g) the percentage who earned less than Rs. 100 per week. 


Weekly wage No. of employees sor vci No. of employees 


* (RS) 

Ў. 8 120—140 35 
201965 12 140—160 18 
60—80 20 160—180 7 
80—100 30 180—200 2 
100—120: 40 


—107: d.—108:75, (c) Mo.—113:33, (d) 81:25—1293, 
n ELLA t te) 48, (f) 12, (8) 401 


R-30 STATISTICAL METHODS 


38. Calculate mean, median and mode from the following data of the 
heights in inches of a group of students : 


61, 62, 63, 61, 63, 64, 64, 60, 65, 63, 64, 65, 66, 64. 


Now suppose that a group of students whose heights are 60, 66, 59, 68, 67 
and 70 inches, is added to the original group. Find mean. median and mode of 
the combined group. (B. Com., Poona, 1969) 


[First Group : 
Combined Group : X —63 75 : Med,=64, Mode=64) 


39. Sixtypes of workers are employed in each oftwo workshops but at 
different rates of wages as follows : 


Workshop A Workshop R 
Т) TAN Eon aa de и ЫН P Aa aA 
DR ortar Rate of No. of Rate of No, of 
wages workers wages workers 
per worker per worker 
Mechanic 2:5 2 30 18 
Fitter 35 14 30 50 
Electrician 40 20 425 8 
Carpenter 30 7 350 12 
Smith 30 6 3:50 10 
Clerk 20 1 500 2 
i In which of the two workshops is the average rate of wages per worker 
higher and by how much ? (M.A Econ., Lucknow, 1967) 


( Average wage—Workshop A Rs. 3°50) 
MESS » Workshop B Rs 325 Н 
7] Average wage is higher іп Workshop 

l A by 25 paise | 


40. Ina sample survey of 60 workers’ families living in a factory area, the 
following data were obtained as regards the number of members in the families. 
From a frequency distribution find the mean and median family size. 


ЭГЕ б: ИЗО Б ORA 9- 6 
Bum du Soo 2» 8 dc у Зо 6m Hs 
9 So 460. Meo Y, ES 655. iB 6:372 
6 а= 8 Zu S 12 аа, 10 6 
7 Le eet Sa E 5 МУ 6 4 
6, ul as: 2. G 9-  2qhv3 m 2015, 


(LC.W.A., 1969) 


[Hint ; Form a discrete freq. dist. X =5'97, Med.—61 
4l. The following frequency distribution is with regard to weight in 


grams of mangoes of a giveri variety. If mangoes of weight less than 443 gm, be 
considered Unsuilable for foreign market, what is the percentage of the total 
DE for it. Assume the given frequency distribution to be typical of the 
а . 


Wt. in gm. J Wt.i ^ 
410 419 14 450—459 4 
10 nerd 5 460—469 18 
— 7 
440—449 54 ое Т 
Draw an ogive of ‘more than’ type for th i 
the value of median, vi a REN LOT боа поті 
[Med.—444] 


Y—632, Med.=63'5, Mode=64 ;" 


bA ume€€——— 
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42. From the results of the two colleges 4 and В below, t i 
them is better and why ? ij X rper edd 


Name of College A . ... College B 

Exam. Appeared Passed Appeared Passed 
M.A, 30 25 100 80 
M. Com, 50 45 120 95 
В.А. 200 150 100 70 
В. Сот. 120 75 80 50 
Toral 400 295 400 295 


(B. Com. Lucknow. 1968) 
(Hint. Compute weighted average ; College A.) 


43. The table below shows the age distribution of heads of families in 
country A during the year 1967 : 


Age of head of Number Age of head of Number 
family (years) (in million) family (years) (in million) 
Under 25 2'22 45—54 9°47 
25—29 + 4:05 55— 64 6:63 
30—34 5:08 65 —74 416 
35—44 10°45 75 and over 1'66 
Do you think in the above case median is a better measure of central ten- 
dency than the mean ? Give reasons, (B.A. Hons., Econ., Delhi, 1967) 
[Med.--44:56] 


44. (a) The mean marks of 100 students were found to be 40. Later on 
it was discovered that a score of 53 was misread as 83. Find the correct mean 
corresponding to the correct score. [397] 


(Б) There were 500 workers working ina factory. Their mean wage was 
calculated at Rs. 200. Later on, it was discovered that the wages of2 workers 
were misread as {80 and 20 in place of 80 and 220. Find the correct Ve 2 

(200° 


(c) A man travels from one city to another. The distance between the two 
cities is 4 miles. He drives his саг at 40 miles per hour. After travelling one 
mile, the car stops running He then travels in a tonga at 10 miles per hour. After 
travelling a distance of 1:5 miles he leaves the tonga and covers the remaining 
distance on foot at 4 miles per hour. Find the average speed per hour of that 
person. [7:27] 

45. (a) A motor car covered a distance of 50 miles four times. The first 
time at 50 m.p.h. he second at 20 m.p.h., the third at 40 m.p.h., and the fourth 
at 25 m.p.h. Calculate the average speed and explain the choice of the average. 

(C. A. Nov.,1967) 
[H M.=29°63] 

(b) A man gets three annual raises in salary, At tbe end of the first year 
he gets an increases of 4%, at the end of second an increase of 696 on his salary as 
it was at the end of the year, and at the end of the third year an increase of 9% 
on his salary as it was at the end ofthe third. Whatis the average pefcentage 
increase ? [63%] 

(c) A machine is assumed to depreciate 40 per cent in the value in the first 
year, 25% in the second year and 10% per annum for the next three years, each 
percentage being calculated on the diminishing value. What is the average per- 


centage depreciation for the five years ? (M. Com., dee n 
o. 


46. (a) Mr. A spends Rs. 100 for apples costing Rs. 5 per kilogram and 
another Rs. 100 for apples costing Rs. 4 per kilogram, Whatis the average price 
of apples per kilogram ? [Average price=4°44) 
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(b) Three men take 12, 8, 6 hours respectively to husk an acre of corn. 
Determine the average number of hours to husk an acre. [X-8:67] 


(c) In a moderately skewed distribution airthmetic mean—24'6 and the 
mode-26'l. Find the value of the median and explain the reason for the method 
employed. (C. A. Nov., 1967) 

[Med.—25:1] 


47. (a) Average monthly production of cotton piecegoods in India for 
first 8 months of 1958 was 409°8 million yds. and for the remaining 4 months 
4121 million yds. Calculate average monthly production for the year as a whole. 

(M. Com., Delhi, 1968) 
[41067] 


(b) Fifty items sold in Dept. *4' of the corner store had a mean price of 
30cents, 75 items sold in Dept. *H" had a mean price of 20 cents. Find out the 
mean price of commodities sold in Depts. ‘A’ and ‘H’. 


(B. A. Hons., Econ., Delhi, 1965) 
[24 cents) 


(c) The mean age of a combined group of men and women is 30 years. If 
the mean age of the group of men is 32 and that of the group of women is 27, find 
out the percentage of the men and women in the group. (C. А.)1969\ 

( | Men=60 per 2] 
` | Women=40 per cent 

(d) The average monthly sales for the first eleven months of the year in 
respect of a certain salesman were Rs. 12,000, but due to his illness during the last 
month, the average monthly sales for the whole year came down to Rs. 11,375. 
What was the value of his sales during the last month ? (C.A. May, 1968) 

(Rs. 4,500] 


48. (a) A train starts from rest and travels successive quarters of miles at 
average speeds of 12, 16, 24 and 48 miles per hour. The average speed over the 
whole distance is 19:2 m. per hour and not 25 m.h.p. Explain and show how you 
can verify the arithmetic. (B.A. Hons. Econ., Delhi, 1965) 


(b) If oragnes for one rupee are bought at 10 paise each and for another 
rupee at 5 paise each, the average price would be 62 paise and not 73 paise. Ex- 


plain and verify. 


(c) You takea trip which entails travelling 900 miles by train atan 
average speed of 60 miles per hour, 3,000 miles by boat at an average of 25 miies 
per hour, 4,00) miles by plane at 350 miles per hour and finally 15 miles by taxi at 
25 miles per hour. What is your average speed for the entire distance 7 

(M. Com., Agra, 1969) 
[31:6 miles per hour] 
[Hint— Weighted H.M.] 


(d) A certain store made profits of Rs. 5,000, 10,000 and 80,000 in 1965, 
1966 and 1967 respectively. Determine the average rate of growth of this store's 
profits, [Hint —Take G.M. of 100 and 700; i.e., 2646] 
49. (a) From the information given below, find ; 
(i) Which factory pays large amount as daily wages ? 
(ii) What is the average daily wage for the workers of the two 


> factories ? 
Factory A Factory B 
Nu. of wage earners 250 200 
Average daily wage Rs. 2:0 Rs. 2:5 


( (i) The total wage bill is the same in both the factories, i.e., Rs. 500.) 
(ii) .—222 j 


D 
n 
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(b) A cyclist covers his first three km. at an average speed of 8 Кт, per 
hour, another two kms at 9 km. per hour and the last two kms at 2 kms. Find 
the average speed for the entire journey, ((B.Com., Agra,1972) 

[4:3 km.] 


(c) The average weight of a group of 25 boys was calculated to be 78°4 
lb. It was later discovered that ene weight was mistead as 69 Ib. instead cf the 
correct value 96 lb. Calculate the correct average. (B. Com., Poona, 1°67) 

[Correct X =79 5) 


50. Figures concerning the number of deaths in two towns in a particular 
year are given below : 


Age Group No, of persons |. Deaths No, of persors Deaths 
(years) Living (A) (A) Living (B) (B) 
0-10 500 100 12,000 4,800 
10—20 3,500 150 .000 360 
20—30 7,000 200 9,000 480 
30—40 10000 300 25,000 250 
over 50 19,000 750 48,000 576 
Total 40,000 1,500 1,00,000 6,466 
Compare the health conditions in both towns. (B. Com., Andhra, 1969) 


{ Taking A as standard population } 

| S.D.R. (A)=37'5 
S.DIR. (B)=27-7 

L Hence B is healthier. 


51. Thefollowing table gives the mortality experience within the same 
period in two places A and B. State which one is healtheier and why. 


Age in years Population A Population В 
Population No. of Population No, of 
Size Deaths Size Deaths 
0-5 8,000 180 2,500 60 
5—40 25.000 120 12,500 8 
40—75 60,000 400 32,000 220 
Over 75 7,000 420 3,000 200 


f S.D.R. population A--11:3 ) 
| S.D.R. population B—12:4 | 
L Town A is healthier. J 


52. Draw one of the ogives for the following data and find : 


(a) The median wage 
(b) The number of workers earning less than Rs, 55 per week. 


Weekly wages No. of workers Weekly wages No. of workers 
(Rs.) (Rs, 
0 41 60—80 38 
20—40 5 82—100 1 
40—60 64 


(C. A. Nov., 1969) 
[ (a) Median wage=42°81. 
L (b) No. of workers earning less than Rs. 55 per week=139, 


. SMRE—10773 
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53. (a) Calculate airthmetic mean from the following data: 


Temp © No. of days Temp. C No. of days 
—40 to —30 10 0 to 10 65 
—30 to —20 28 10 to 20 180 
—20 to —10 30 20 to 30 10 
—10to 0 42 


(C. A. May, 1968) 
[X 4288] 


(b) The value of a machine decreases at а constant rate from the cost price 
of Rs. 1,000 to the scrap value of Rs, 1С0 in teg years, Find the annua! rate of 
decreasing, and the machine at the end of one, two, three years, 


[20'5%, Rs. 795, Rs. 632-03, Rs. 502:46) 


54. Compare the mean and the mode as measures of central tendency. 
Calculate the mean and the mode from the following : 


Marks No, of Students Marks No. of Students 
Marks below 10 15 Marks below 50 96 
» » 20 35 + » af 127 
» » 30 60 = RES.) 198 
” » 4 84 » » 80 250 


(B. Com., Bangalore, 1968) 
[X504 Mo 66-78] 


55. . Calculate the mean and median for the following : 


Value Frequency Value Frequency 
0—4 328 30—39 598 
$--9 350 40—49 524 
10—19 720 50—59 378 
20—29 664 60—69 244 


(B. Com., Kerala, 1969) 
[X —292, Med.—271] 


56 (a) Describe the various Measures of central tendency of a frequency 
distribution, pointing out their relative merits and demeriis. 


(b) Find the mean, median and modal ages of married women at first 
child-birth. 


firstchild 13 14 15 16 17 18 19 20.21 22 23 24 25 
No. of 


married 
women 37 162 343 390 256 433 161 355 65 85 49 46 40 


(I.C.W. A., Jan., 1970) 
LY¥=17-72, Med.—18, Mo—18] 


1.57. Ifa constant is added or subtracted from each score in a series what 
will Ee the effect on the mean ? 


f When a constant is added average is increased) 
by the constant and when a constant is deduc- | 
(ted average is reduced by the constant. J 
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58. Calculate the general death rate and standard death rate of town А 
and compare them with the general death rate of town B. 


Age Local Population Standard Population 
Group Town A Town B 
Years Population Deaths Population Deaths 
Under 10 20,000 600 12.000 372 
10—20 12,000 240 30 000 6c0 
20—40 50,000 1,250 62,000 1,612 
40—60 30,000 1,030 15,060 525 
Above 60 10,000 500 3,000 180 
р RU accion ШО qun er 
[C.D.R. (А)=26`67; S.D.R.—2529, C.D.R. (B) 27451 (B.A., Pilani, 1969) 


<9. (а) In a batch of 15 students, 5students failed in a test. The marks 
of 19 students who passed were 9, 6, 7, 8, 8, 9,6, 5, 4, 7. What was the median 


of the marks of ail the 15 students ? (B. Com., Bombay, 1970) 
[Меа=6] 


(b) The mean monthly salary paid to 77 employees in a company was 

Rs. 78. The mean salary of 32 of them was Rs.75 and that of other25 was 

Rs. 82. What was the mean salary of the remaining? (В. Com., A On 

. 60. Whatare crude and standardised rates ? Why is comparison on the 
basis of standardised rates more reliable ? 

Calculate the crude and standardised death rates from the following data : 


Age Group Population Death Rate Standard Age 
per 1,000 Distribution 
0-10 400 40 600 
10—20 1,500 4 1,000 
20—60 2,400 10 3,000 
60 апд оуег 700 30 400 


(B. Com., Bombay, 1969) 


|. [C.D.R.-134 ; S.D.R.—141 
61. (a) How are the mean and median affected when it is known that for a 
group of 10 students scoring ап average of 60 marks, the best paper was wrongly 
marked 80 instead of 75 ? (Hint. Median is not affected ; correct X—59:5] 
(b) The following distribution represents the number of minutes spent by 
a group of teenagers in going to movies. What is the median ? 


Minutes|Week Number of Minutes|Week Number of 
teenagers teenagers 
0—99 27 400—499 58 
100—199 22 500—599 32 
200—299 65 600 and more 9 
а R (B.A. Hons., Econ., Delhi, 1970) 
(Med. —333*5| 
62. The following table shows the age distribution of persons in a partie 
cular region. 
No, of persons Age No. of persons 
И, (in Andr] (years) (in thousands) 
2 Below 50 14 
M 20 5 » 60 15 
7 30 E ER 155 
^ 40 12 70 and over 156 
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(i) Find the median age, 


(ii) Why is the median a more suitable measure of central tendency than 
the mean in this case ? (B.A. Hons., Econ., Delhi, 1971) 
[Med. —27] 


63. Given below isthe distribution of profits (in "COO rupees) ‘earned by 
94 Book Depots in a certain territory. 


Profit No. of Book Depots 
Below 20 5 
ro 30. 14 
» 40 27 
1—40 48 
» 60 68 
spi 20 83 
» 80 91 
» 90 94 
Find the modal value. 
` [Mode-- 48:89] (B. Com.{Bombay, 1971) 


64. Amend the following table and locate the median from the amended 
table ; 


Size Frequency Size Frequency 
10—15 10 30—35 28 
16—17:5 15 35—40 30 
175—20 17 40 and onward 40 
20—30 25 


(В. Com.,Nagpur, 1972) 
Take classes as 10—20, 20—30, 
30—40 and 40 & onward } 
Med.=32'7 


65. Give a specific example of your Own for each of the following cases : 
(a) The median is preferred to the arithmetic mean, 


(6) The geometric mean would be more Satisfactory than the arithmetic 
mean. 


(c) The median would be preferred to the mode, 
(d) The mode would be preferred to the median, 
(e) The harmonic mean must be used instead of the arithmetic mean. 


(f) No average would be meaningful, 


66. The following figures represent the number of books issued at the 
counter of a commerce college library on 12 different days : 


96, 180, 98, 75, 270, 80, 10, 100, 94, 75, 200. 610. 
Calculate the arithmetic mean, median and mode for this data. Which of 
these would represent the above data best ? 
(B. Com., Bombay, 1970) 
X — 165, Med=99, Mode —-75 
Mean represents the data best 


67. Find the missing frequencies in the following distribution if N—100 


and median of the distribution is 30. 
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Marks 


0—10 
10—20 
20—30 
30—40 
40—50 
50—60 


R-37 
No. of Students 
10 
25 
30 
10 
(M. Com., Delhi, 1969) 


[The missing frequencies are 15 and 10 respectively] 


68. Following is the distribution of marks in law obtained by 50 students : 


Marks (More than) 


* No. of Students 


50 
46 
40 
20 
io 

3 


Calculate the median marks, If 60 per cent of the students pass this test, 
find the minimum marks obtained by a pass candidate. 


(B. Com., Delhi, 1972) 
[Med.—2T'5) 


69. (a) The table below shows the number of skilled and unskilled workers 
in two small communities, together with their average hourly wages ; 


Ram Nagar 
Worker Number Wage 
Category per hour 
Skilled 150 Rs. 1°80 
Unkilled 850 Rs. 1°30 


Determine the average hourly wage fo 
the results show that the average 
average hourly wage in Ram Nagar, 


Shyam Nogar 


Number Wage 
per hour 

350 Rs. 1°75 

650 Rs. 1:25 


reach community. Also give reasons why 
hourly wage in Shyam Nagar exceeds the 
even though in Shyam Nagar the average 


hourly wage of both categories of workers is lower. 


[Average wage Rs. Y 375 


Ram Nagar Shyam Nogar 
Rs. 1:425], 


(b) An investor buys Rs. 1,200 worth of shares in a company each month 
During the first 5 months he bought the shares at a price of Rs. 10, Rs. 12, Rs. 15, 
Rs.20 and Rs. 24 per share. After 5 months what is the average price paid for 


the shares by him ? 


(B. Com. Delhi, 1972 ; 
B. Com. Pass, Delhi, 1973) 
[Rs. 14:63] 


70. (a) An automobile runs the first 20 km. at the average speed of 


30 km. per hour next 30 km. at the average sreed of 40 k 
Find the average speed of the automobile over 


next 50 km. at 60 km. per hour. 
the first. 100 km. of the journey. 


(b) Average monthly producti 
а country was 407'5 thousand tonnes. 
412:5 thousand tonnes. 15 it correct to sa 


for the whole year was 


per hour and the 


(B. Com. Bombay, 1972) 
[44:4] 


on of minerals in the first eieht months in 
For the next four months the average was 
y that the average monthly production 


410-0 thousand tonnes ? 


(В.А. Hons. Econ. Delhi, 1973) 


[No, it is 40917 th. tonnes] . 


е. 
R-38 * STATISTICAL METHODS 


71. (a) Compute mean and mode of the following distribution of marks 
obtained by 125 students in Statistics at a certain examination : 


Marks No. of Marks No. of 
Students Students 
0—10 3 50—60 35 
10—20 4 60—70 20 
20—30 8 70—20 16 
30—40 10 80—90 8 
40—50 15 90—100 6 
[X—55:8, Med.=56'43, Mode=5571) (B.Com. Poona, 1973) 


72. Marks of 25 students of a class are given below. You are required to 
find mean, median and mode. 


Roll No. Marks Roil Ne. Marks Roll No. Marks 

1 43 10 67 19 50 

2 45 1 57 20 50 

3 63 R 64 21 42 

4 34 13 40 22 59 

5 56 14 50 23 36 

6 37 15 35 24 50 

Я 50 16 62 25 48 

8 60 17 44 26 

9 66 18 32 


(B. Com. Raj. 1973) 

[ X=Med.—M,=50)} 

.. 73. From the following data calculate the measure of central tendency 
which will approptiately describe the distribution : 


Class Mark — 462 480 498 5 16 534 552 570 583 606 624 
Frequency 98 75 5 щі: 21 15 | Жас 
(B. A. Нопз. Econ.,Delhi, 1973) 


, 74. Given below are the marks obtained by 100 students of a class in a 
certain subject, Répresent this information graphically and find out median 
marks from the graph : 


Marks out of 100 Number of Students 
Less than 10 4 
» » 20 8 
» .» 30 25 
-» » 40 34 
» » 50 41 
>» » 60 46 
» » 70 50 
» » 80 60 
» n 90 85 
» » 100 100 


[70] 


75. A man purchases 5 rolls of ribbons each measuring 60 metres at the 
rate of 4, 6, 10, 12and 15 meters a rupee respectively. Find the average price of 
ribbons, Will it make any difference if the man purchases ribbon worth Rs, 4 
from each of the above rolls ? (B. PTT 

9'4 & 7:5) 
o P6. А laundry uses two different makes of washing machine. According 
to its past experience, the fol owing results have been recorded : 


Make of Median life Mean life 
A 6,500 hours 6,000 hours 
B 6,000 hours. 6,500 hours 
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It both makes are of the same price, which make should the laundry 
purchase from now on? Give reasons, 


77. А transport company kept statistics for several years оп two makes of 
tyres. It found the following results : 


Tyre Median Mean 
4 20,000 km. 23,000 km. 
B 23,000 km. 20,000 km. 


. Assuming that the two tyres sell at the same price, which make would you 
advise the transport company to purchase and why ? 


78. In order to compare two brands of tyres, a research organisation tested 

5 tyres of each kind, measuring the mileage for which each tyre gave adequate 

service. The results of this iest were: The tyres made by Firm А lasted 26,800, 

22,300, 27,400, 24,000 and 23,500 km. while tnose made by Firm B lasted 25,600, 

23,400, 21,000, 26,000 and 25,000 km. Comment on the claims made by both 
firms that ‘on tbe average’ their tyres showed up better is this test. 

(B.A., Bombay, 1973) 


. 79. А travelling salesman made five trips during the past two months, 
making the sales tabulated below : 


No. of. Value of Sales per 
Trip days sales Day 
: (Rs.) (Rs) 
1 4 500 125 
2 5 380 76 
3 8 576 72 
4 3 108 36 
5 4 260 65 
М=24 1,824 374 


The sales manager criticised the salesman’s performance as not very good 


374 к 
since his mean sales are only Rs. 748 ( er =748 ). Thesalesman replied that 
the sale manager was unfair in making sucha statement for his mean sales were 


as high as Rs. 76( ut -16). What does each average mentioned here mean ? 


(b) Which mean seems to you appropriate in this case ? 
80. The following data relate to the thickness of 25 bomb bases (inches): 


"134 0:150 0171 0:143 0132 
0 Mi 0145 0:144 0 123 07160 
0:140 0:167 0114 0:156 0135 | 
0:168 0:130 0:147 0:178 0:155 
0 150 0:130 0:159 0:150 0:138 


Classify into groups 01095-01193. Quod etc. зер е histo- 
lculate the mean and median from the frequency distribution. 
Mare d (B.A. Hons. Econ., Delhi, 1972) 


81. Typist 4 сап type a letter in 5 minutes, typist B in 10 minutes and 


i i i i er of letters typed per hour per 
neg jn 15 minutes. What is the average number ур С Et) 


(b) А taxi ride in Delhi costs one rupee for the first kilometre and sixty 


i i is i d at the 
i dditional kilometre, The cost for each kilometre is incurre | 
неее the kilometre, so that the rider pays for п hole, EU elle 191 5 
the average cost for 2} kilometres ? 5 : RA kde 
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; " E is the 
(c) If the price of a commodity doubles in a pericd of 4 vears what is t 
Average percentage increase per year? (B. Com. gd 
0. 
82. Inthe frequency distribution of 100 families given below, the number 
of families corresponding to expenditure groups 20—40 апа 60—80 are missing 
from the table, However, the median is known to te Rs. 50, 


Find the missing 
frequencies, 
Expenditure 0—20 20—40 40—60 60—80 80—100 
No. of families 14 ? 27 ? 5 


: 1 
(B. Com., Delhi, 1974) 
[23 and 21} 


83. Draw a curve on the graph paper from the data given below and 
answer the following questions : 


(Кыс. (a) What is the range of marks obtained by middle 80% of the 
Students ? 


(5) What is median marks ? 


Marks out of 60 No. of students 
Less than 10 4 
” » 20 10 
»  » 30 30 
» » 40 40 
Fa ie I, 47 
» » 60 50 
(Med. =27°5] | (B. Com., Rajasthan, 1974) 
84. Calculate modal value from the following data : 
Income (Rs.) No. of persons 
Less than 100 8 
» » 200 22 
зэ. 5». 300 35 
»  » 400 : 60 
» s» 500 67 
»  » 600 


70 
(В.А. Hons, Econ., Kurukshetra, 1975) 


[Mode = 340) 


85. The following table gives the distribution of 160 workers in a 


factory : 
Wages (Rs.) 


No. of workers 
More than 80 


B 160 
hte 151 
»  » 100 134 
»  » По 104 
»  » 120 60 
»  » 130 29 
» .— » 140 10 
* з» 150 0 


, 
Draw an ‘Ogive Curve’ from the above data and determine ; 
ti) The number of workers earning less than Rs, 135 ; 
(ii) The number of workers earning at least Rs. 118 ; and 
(ii) The median wage of the workers, 
(Med. = 115-45) (B.A., Bombay, 1973) 


(В.А. Hons. Econ., Delhi. 1975) 
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: 87. Calculate the median and mode for the distribution of the wa.ght of 
50 students from the data given below : 


Weight in kg. Frequency 
30—40 18 
40—50 37 
50—60 45 
60—70 27 
70—80 15 
80—90 8 


(В.А. Hons. Econ., Delhi, 1975) 
(Med.=5445 ; Mode=53'08) 


SECTION 8 
MEASURES OF VARIATION 


1. Explain the term dispersion. What purpose does a measure of disper- 


sion serve ? Distinguish between absolute and relative measure of dispersion. 
(B. Com., Poona, 1972) 


(a) What are the requisites of a good measure of dispersion ? In the 


ӯ 2. 
light of these, comment on some of the well-known measures of dispersion. 
(4.C.W.A., 1967) 


(b) What is dispersion ? Explain what you understand by absolute and 
ss of relative dispersion known 


relative dispersion. Describe some of the measur: 
to you, (B. Com., Bombay, 1973) 
4. Describe the various measures of dispersion known to you and com- 
pare their properties, (M. Com., Delhi, 1970) 
4. Define, ‘Mean Deviation’. How docs it differ from standard 
deviation ? (С.А. 1968 ; LC.W.A., 1969) 
5. Define the mean deviation, standard deviation and inter-quartile range 
of a frequency distribution. Why is the standard deviation usually chosen as a 
measure of dispersion ? Give an example in which you would prefer an alter- 
pative measure of dispersion. (B.4. Hons. Econ., Delhi, 1966) 
andard deviation is reearded as superior to the other 


6. Explain why st 
its chief defect ? (C.A., May, 1968) 


measures of dispersion. What is 


xamples when you would use the range or the 


7. Explain with suitable е: 
(B. Com., Delhi, 1969) 


standard deviation as a measure of dispersion. 


8. (a) What are quartiles ? How are they used for measuring dispersion ? 
(C.A.,1967) 


(b) Discuss briefly the relative merits of vaiious measures of dispersion. 
(В.А. Hons, Econ.,Delhi, 1972) 


6 9. What is Lorenz Curve? How is it drawn? In what way does it help 
in studying variation of two or more distributions ? illustrate with the help of an 


example, 
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10. Two random samples of the same size happen to have the same mean. 


Would you therefore conclude that the two samples are equally good? Justify 
your answer. 4B. Com., Poona, 1971) 


11. What is cocfficient of variation ? What purpose does it serve? Also 


distinguish between ‘variance’ and ‘coefficient of variation’. 
(B. Com., Bombay, 1968) 


12. Calculate range and ils coefficient from the follawing data : 
Price of gold per 19 gm. from Monday to Saturday 
Monday = Tuesday Wednesday Thursday Friday Saturday 
160 158 170 142 176 187 
[Range Rs. 45 ; Coeff. of Range —0'137] 
j 13. Compute quartile deviation and mean deviation ftom the following 
ata: 
Height in inches Мо. of students Height in inches No. of students 


58 15 63 22 
59 20 64 20 
60 32 65 10 
6t 35 66 8 
62 33 


(B. Com., Agra, 1969 ; B. Com.,Kurukshetra, 1974) 
[Q1—60, 02=61, 03=63, М.р.=1'74 


14. (а) How will you select an appropriate measure of variation ? 
(B. Com., Bombay, 1971) 


(b) What measures of dispersion will you choose for : 
(i) the movement of prices in a stock market, 
(ii) reporting on the rainfall in a certain region, and 


(iil) a frequency distribution with an open interval either at the 
beginning or at the end, 


15. From the following table compute the quartile deviation : 


Size Frequency Size Frequency 

4—8 6 24—28 12 

R—12 10 28—32 10 

16 20 30 36—40 2 

> 36—40 

20—24 15 2 
(C A., Nov. 1969) 
[Q.D.— 52) 
16. (a) Fora series the mean deviation is 15. Find the most likely value 
of quartile deviation. (B. Com , Agra, 1968) 

8 


[m eo Eno] >] 


(5) Calculate the mean, standard deviation and the coefficient of variation 
of the grades obtained by 20 students in an examination in Statistics : 
62, 85, 73, 81, 74, 58, 66, 72, 54, 84 
$5 0909 $5. 58.85. 152, 88. MB Tl. 75 
im (B. Com., Madras, 1967) 
LX —7099, во=11°45, C.V.—16:15] 
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17. Ten students of the В. Com. class of a collere have obtained the 
following marks in Statistics out of 100 marks. Calculate the standard deviation 
of marks obtained. 


Serial No. Marks Serial No. Marks 
1 5 6 42 
2 10 7 45 
3 20 8 48 
4 25 9 70 
5 40 10 80 


(B. Com., Osmania, 1969) 
[o=23'06 


18. Calculate mean deviation for the following frequency distribution : 
No. of colds experienced No. of No of colds experienced No. of 


in 12 months persons in 12 months persons 
0 15 6 82 
1 46 7 26 
2 91 8 13 
3 162 9 2 
4 110 (І.С.И/.А., 1968) 
5 95 (M.D, -- 466] 
19. Calculate standard deviation from the following data : 
Mid points Frequency Mid-points Frequency 
1 2 6 155 
2 60 7 79 
3 101 8 40 
4 152 9 1 
5 205 (I.C. W.A.,1969) 
[0-157] 


20. From the measurements given below, prepare а frequency distribution 
table in exclusive form taking a regular class-interval of 2 umts each, wil h7as 
your starting point. From tbe table so prepared calculate the values of mean and 
standard deviation. 
Size of collars in inches 

15:6 (72 148 159 82 101 175 98 194 136 158 

AE ES A | 1258 137 105 1 1178 16 177 

(B. Com., Agra, 1969)_ 


13:45 


21. The following table gives the heights of students in а class. Find out 
the quartile deviation. 


Height (in inches) No. of Students Helght (in inches) No. of Students 


50—53 2 59—62 s 7 
33—56 7 62—65 13 
56—59 24 65—68 3 


(В. Com., Raj, 1973) 
[9.D.=2'21] 


22. Calculate the appropriate measure of dispersion from the following 


data : 
‚ of wi Wages in Rs. No. of wage 
мыз сыы No UNS per weei wor 
" 35 14 41—43 1 
aT , & over 43 7 
315 d: (І.С... 1971) 


[Coeff/ of Q.D. 07046] 
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23. Calculate Median and Mean Deviation for tbe following frequency 
distribution : 


Age (years) Мо. of persons Age (years) No. of persons 
1—5 7 26—20 18 
6—10 10 31—35 10 
11—15 16 36-40 5 
16—20 32 41—45 1 
21—25 24 


(B. Com., Lucknow, 1972) 

1 [Median 1995, M.D.—7 99) 

24. From the following data compute the value of mean and standard 
deviation : 


Weight Frequency Weight Frequency 

ubs;) bs.) 

92—95 010 116— 23-00 

95— 0:90 119— 48750 

93— 150 122— 28°00 
10}— 250 125— 23°00 
104— 1°50 128— 12/00 
107— 12:50 131— 5'50 
110— 16'00 134—137 5:00 
in- 2000 


Mezn height cf the same 200 boys in inches is 63:02 with a standard devia- 
tion of 3'00, Are these boys more variable in height or weight ? 
(B. Comi, Madras, 1969) 
(CV. for Height=4'76, С.У. for Weight=5°96 ; 
Boys are more yariable in weight) 
25, Calculate mean deviation and standard deviation from the following 


data : 
Profits (in Rs.) No. of firms Profits (in Rs.) No. of firms 
5,000 to 6,000 10 - 0 to 1,000 4 
4,000 to 5,000 15 —1,000 to 0 6 
3,000 to 4,000 30 — 2,060 to —1,000 8 
24000 to 3,000 10 —3,000 to —2,000 10 
1,000 to 2,000 5 


(B. Com.. Madras, 1967) 
[M.D.—2,095, о=2,534] 


26. Using Mean Deviation compare which of the series is more variable 
Й Nai 
(vse median and I) 


A B A B 

48,224 2,962 42,624 4,222 

39,648 1,348 32213 6.498 

GR ved 72,642 9,981 

,692 ,349 ,427 

51507 Нее 68,34 3,42 
(C.A., 1968) 


Hence, series В is more variable 


27. (a) Calculate the st iati 
andard deviation f, 
the short method and show symbolically the for ч 


Coeff. of M.D. series 4=0'267; Coeff. of M.D. series B=0'409> 
J 


om the following data, using 


are based mula on which your calculations 
€ oe Frequency Age (years) Frequency 
20—29 3 50—59 53 
30—39 $1 60—69 19 
"is 223 70—79 4 
407-49 137 


(.C.W.A , 1967) 
[c710:32] 
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(b) For a set of 100 observations, the sum of the deviations from 4 ст. is 

—1] cm. and (ће sum of the squares of these deviations is 257 sq cm Find the 

coefficient of variation. (I.C.W.A.)1967) 

(C.V.—41139,] 

28 Find the standard deviation and the coefficient of variation from the 
following дата : 


Wages No. of workers Wages No. of workers 
Up to Rs. 10 12 Upto Rs. 50 157 
5120 30 „” 990 202 
VEL 65 on 222 
40 107 „ 80 230 


(B. Com., Lucknow, 1368) 
(51726, С.У. 4871] 
29. The values of X and с of the following frequency distribution are 
£1353and £ 9'6 respectively : 
XA. m ek Aena ek O° +1 42, 3 
T. 2 5 8 18 22 13 8 4 Total 80 
Determine the actual class interval and the class limits. 
1i—6, 4—136'5, classes would be 119:5—115'5, 115°5—121°5, ete.) 
30. The data given below relate to the heights and weights of 20 persons’ 
You are required to form a two-way frequency table with class-intervals 62'* to 
64”, 64” to 66", and so on, and 115 10 125 Ib., 125 to 135 Ib., and so оп. 


S. No. Weight Height S. No. Weight Height 
1 170 70 1i 163 70 
2 135 65 12 139 67 
3 136 65 13 122 63 
4 137 64 14 134 68 
5 148 69 15 140 67 
6 124 63 16 132 69 
7 117 65 17 120 66 
g 128 70 j8 148 6R 
9 143 7A 19 129 67 
10 129 62 20 152 67 
Using the standard deviation and its coefficient, state whether there is 
(С.4. 1972) 


greater variation in heights or weights. 
(С.А. (height) = 49%, С.У. (weight) 977 fo ; 

L There is greater variation in weights), 
esis bursting pressure oi the samples 


1. Supposethat a prospective buyer t 
: en taciurers А and В. The tests reveal 


of the р lythene bigs received from two тапи! 
the following results : 


i re No. of Bags Bursting pressure No. of Bags 
Bursting pressu 5 of ав ut M үз 


и. й 
5-10 2 9 20—25 54 32 
10—15 9 it 25—30 11 27 
15—20 29 18 30—35 5 13 


Wnich manufacturer's bag judging from these two samples have the higher 
average bursting pressure ? Which of them is more uniform in bursting pressure? 


(C.A., Nov., 1969) 


Manufacturer A, C.V.=23'08. X-21905, 
Manufacturer В, C.V.—32:35, 0216, 
B's bags have higher average bursting pressure. 
But the bags of A have more uniform pressure . ^ 
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32. Find the aciul class groups from the information given below : 
4' —3 —2 —l 0 I 2 3 4 
fe 15 15 23 22 25 10 5 10 
You are told that Ẹ of the distribution is 35°16 and с —19-76. 

(A=35, C=10 ; Class groups 0—10, 10 —20......... 70—80) 


33. Following are the marks obtained ty two students A and B in 10 tests 
of 30 marks each : 


Tests d 4 4e 1 5.779. 39 
Marks obtained 

by А 80 76 48 S2 72 68 56 60 54 
Marks obtained 

by B ӘН 0. 6 6é9 7$ "S51 51 66 


If the consistency of performance is the criterion for awardiog a prize, who 
should get the prize usd (B. Com., Poona, 1967) 


С.У. (student A) —19:20 
[я (student В) = 14:02 ] 
Prize should be given to B 


34. Find which of the following batsmen is more consistent in scoring. 
Would you also accept him as а better batsman ? Why ? 


Batsman 4: 5 ^7 16 27 39 8 $6 61 80 201 105 
жЕ 8 a Ie? BS at 4 5 7$ 8) 90 95 


Compare also the Co-efficient of Quartile Deviation of the two scries. 
(В. Com., Poona, 1958) 
Batsman A X —50, C.V.—67'08, coeff. of Q.D.—0:677 
[ n  BX=48, C.V.—69552, coef. 26860617 
А is a beiter batsman and he is also more consistent. 
à 38. The following table gives the distribution of households according to 
size in two cities 4 and B : 


Size of household: City A — City B Size of households City A City B 


1 24 14 5 13 14 
2 10 10 6 10 11 
3 12 12 7 6 10 
4 15 13 8 10 16 


i Derive a measure to study the variability of the distribution. 


EE Econ., Delhi, 1967) 

.V. for City A=60% 

(С.Р. for City B-- 51:59, ) 
36. Compute an absolute measure of variance and lati 

dispersion f rom the following table : eee oF 


Income in Rs, No. of Families Income in Rs. No, of Families 


300—399 30 700—799 = 60 
400—499 46 300—899 50 
500—599 58 900—999 20 
600—699 76 


(B.A, Hons., Econ., Delhi, 1966) 
[c3—28,022776, C.V.— 2694] 
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37. From the marks given below obtained by two st i 
course, find out who was the more consistent RE Кс БЕЗДЕ cio leq 


A sg" 759 9760 65 1166 52. 75 3 
ue Aden i-o M НМУ E 
(M.A., Econ., Delhi, 1968) 
(Student A, C.V.=20°96 : Student В C.V.—252; 
L Student A is more consistènt ) 
38. The following table gives the fluctuations in the prices of sh 
companies, А and В. Find out which of them shows bead ТИМ а 
ment on the result. 


Price in Rs. Price in Rs, 
Shares А Shares B Shares A Shates B 
318 2,542 324 2,545 
322 2.522 315 2,530 
325 2,534 308 2,566 
312 2,532 319 2,550 


(B.A. Hons., Econ., Delhi, 1968 : B. Com., Delhi, 1969) 
eee А, С.У.=1"75, Shares B CV. =0°508 ;] 
Shares of Co, A are more variable. ) 
39. Prices of a particular commodity in five years in two cities are given 
below : 


Prices in City A Prices in City B — Prices in City A Prices in City B 


20 10 23 12 
22 20 26 15 
19 18 


Find from the above data the city which had more stable prices. 
(B,Com., Andhra. 1968) 


(CV. (City 4)—11:13, C. (City В) =246:) 
fe City A had more stable prices J 


40. From an analysis of monthly wages paid to workers in two organi- 
sations C and D, the following results were obtained : 


с р 
No. of workers 550 600 
Average Month!y wages 60 48:5 
Variance of the distribution of wages 100 144 
Obtain the average monthly wages and the variability in individual wages 
of all the workers in the two organisations taken together. (1.C.W.A., 1969) 
L 1254. 01:5125] 


d in the same industry inan area, the 


4i То two factories A and B engage ime 
dard deviation are as follows : 


average weekly wages jn rupees and the stan 


Factory Average weekly wages S.D. No. of wage earners 
A 345 50 416 
B 285 45 524 


B pays out larger amount as weekly wages ? 
greater variability in the wage rate ? 
D. of all workers in the two factories taken 


(a) Which factory 4 or 
(b) Which factory shows 
(c) Find the mean and S. 


together ? 
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(d) What is the co-efficient of variation in the case of each factory sepa- 
rately? What inference do you draw from а comparison of the two figures. 


(I.C.W.A.. 1968 :В, Com., Panjabi, 1975 


f Factory A Factorp B > 
| Total wage Bill Rs. 16,422 
| С.И=14°5 Ch 
t Ха 
42. (а) The mean of two samples of sizes 50 and 100 respectively are 54:1 
and 503 and the standard deviations are 8 and 7. Find the mean and the standard 
deviation of the sample of size 150 obtained by combining the two samples. 


(B.Com., Lucknow, 1967) 
СХ 9==51°57, 013— 7:56] 
(b) Two samples of size 40 and 50 respectively have the same mean 53, 
but different standard deviations 19 and 8 respectively, Find the standard 
deviation of the combined sample of size 90. (С.А. Мо», sp 
(оза= 14) 
(c) For a group of 200 candidates the mean and standard deviations were 
found to be 40 and 15, Later on it was discovered that the score was. misrcad as 
53. Find the corrected mean and standard deviation corresponding to the correct- 
ed figure, (1.C.W.A, 1972) 
Correct mean=39'95 | 
Correct S.D.=14°97 ) 
(d) Mean and Standard deviation of 200 items are found to be 60 and 20. 
If at the time of calculations two items are wrongly taken as 3 and 67 instead of 
13 and 17, find the correct mean and standard deviation. 
(Correct X =59°8, Correct o=20 09) 
43, Comment briefly on the following statements : 
(a) The median is the point about which the sum of the Squared devia- 
tions is minimum. 
(b) A computer found that the standard deviation of a set of 40 observa- 
tions whose values ranged between 116 and 136 is 22. 


(c) The range is the mean perfect measure of variability because it includes 
all the measurements, 

(d) After settlement the average weekly wage in a factery had increased 
from Rs.8 to 12ard the standard deviation had increased from lio 1:5: After 
Settlement the wage has become higher and more uniform, 


44. From the following table giving data regarding income of workers ia 


two factories draw a graph (Lorenz Curve) to show which factory has greater 
inequalities of income : 


Income Rs. Factory A Factory В Income Rs, Factory A Factory B 


Below 500 — 6,000 53000 2,000—3,000 1,500 2,200 
500—1,000 4,250 00 3,009— Е 
Сою) 4258 4500 009—4,000 650 1,500 


Income Factory A 
(Кз) actory Factory B 
Below 200 7,000 
200— 500 1.000 1200 
500—1,000 1,200 1,500 
1,000—2,000 ? 8o "400 
2,003—3,000 500 200 


(B. Com , Agra, 1965) 
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ў 46. Comment on the statement “After settlement, the average weekly wage 
in a factory~had increased from Rs.8 to Rs, 12 and the standard deviation had 
increased from 2 to 2:5. After settlement, the wage has become higher and more 


uniform. (В.А. Hons, Econ , Delhi, 1972) 
47. A distribution consists of three parts, characterised as follows : 
Parts Number of Arithmetic S.D, 
items average 
1 200 5 3 
2 250 10 4 
3 300 15 5 
Show that the arithmetic average of the whole distriburion is 16 and its 
standard deviation is 7:2 approximately. (M.A. Econ., Agra, 1966) 
48. Find mean and standard deviation for the following data : 
Wages No. of Labourers Wages No. of Labourers 
(Кз) (Rs.) 
30—32 12 38—40 12 
32—34 18 40—42 8 
34—36 16 42—44 6 
36—38 14 
Hence obtain the cofficient of variation. (В.А. Kerala, 1969) 


49. From the prices of shares X and Y, given below, state which is moie 
stable in value, by calculating coefficient of variation : 


ERA MALT EY 54.525 5321125607 hE, Бэ P SO DUST T 02,49. 
Y 18 107 105 105 106 107 104 105 104 101 


(M.A, Econ., Jabalpur, 1973 ; B. Com., Rajasthan, 1974) 


Shares X, C.V.=4'99 
Shares Y, C.V.=2'90 ; Shares Y are more stable } 


50. Find the value of coefficient of variations in the following cases ; 
(i) S.D.—3:5, п=10, Zx—145. 
(ii) Variance=148°6, Mean=40. (B.Com., Mysore, 1968) 
. [G) С.Ў.=2414: (ii) C.V.=30-5) 
51. Given the following data, determine the mean and standard deviation 
of the combiued set : 


No. of items Mean | Variance 
n 5% 
1 25 25°2 4°90 
п 30 21:5 6:25 


(B.A.| Madras, 1969) 
(X3: —23718, 01,2 3705) 


52. Calculate the standard deviation and the coefficient of variation from 
the following data : 


Age No. of persons Age Мо. of persons 
20—30 3 60—70 140 
30—40 61 70—80 51 
40—50 132 80—90 2 
50—60 153 


(B. Com., Nagpur, 1969) 
(C. V. 21715) 


SMRE—10°77-4 
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as «3% Goals scored by two teams A and B in a football match were as 
IW: 


No. of goals scored Мо, of matches 
in a match A B 
0 21 17 
1 9 9 
2 8 6 
3 5 5 
4 4 3 
By calculating the co-efficient of variation in each case find which team 
may be considered more consistent. (М.А. Econ., Meeruth, 1975) 
[Team А: C.V.=123'9 : Team В: C.V.—1092 
( Team В is more consistent. ) 


(a) Explain the term (i) standard deviation ; and (ii) coefficient of 
M Mad 


(b) The scores of two batsmen, А and B, in ten inning durings a certain 
match are as under : 

A: 32 28 47 63 71 39 10 60 96 14 

BW 19. 30 48 —553 .67 991 10. 62+ 49-930 


Find which of the batsmen is more consistent in scoring. 
(LC.W.A,, Jan., 1970) 


[ Batsman A: С.У = 554 ] 
1 Batsman B: C.V.— 488 
Batsman B is more consistent. ] 


55. Find the mean daily earnings and standard deviation of earnings from 
the foll. wing data : 


36 men get at the rate of Rs. 10 per man per day 


40 „ » » П » » 

90 m »" » 12 » - 

138 H »" ERG x 5 

80 m » » 14 » » 

6l “ » ». 15 » " 

25 » » » 16 » „ 

: © (B.A., Pilani, 1969) 

CX = 1299 
Usi= 1°55 


56, From the following table calculate the values of : 
(i) Mean 
(ii) Standard Deviation. 
- (iii) Coefficient of Variation, 


Income in Rs. No. of persons 
70—80 12 
80—90 18 
90— 100 35 

100—110 49 

110—120 50 

120—130 45 

130—140 20 

140—150 8 


В. Com., Mysore, 
СМ СҰ. ee 


MEASURES OF VARIATION R-51 
g 


57. (a) The number of runs scored by cricketers A and В dui ing 5 te 
matches are shown below, Make a comparative siudy. Еле E^ 


A + 20 90 76 102 90 6 108 20 1 
B 4023. 35...60. 62. 54,176 42, 30 30 2 < 


(В.А., Hons., Econ., Delhi, 1970) 


[X C V. 
c A533 77% | 
| B453 373% | 

A is a better run getter 
B is more consistent J 


(b) Coefficient of variation of two series are 60% and 80%. Their stan- 


dard deviations are 20 and 16.. What are their arithmetic means ? 
с (В. Сот. , Delhi, 1970) 


X333, 20) 
58. Compute the mean and standard deviation from the following data : 
Earnings No. of employees 
о. 

50—60 35 
60—70 48 
70- 80 65 
80—90 - 90 
90—100 182 
100—110 173 
110—120 156 
120—130 177 
130—140 65 
140—150 59 

а (B. Com., Nagpur. 1970) 


[ў=105°2, «—22:16] 


59. (a) Calculate the coefficient of variation forthe following age distri- 
bution of 125 persons : 
Age under (yrs.) 10 20 30 40 50 60 70 80 
Мо, of persons 1505301453 1:9 100 110 115 125 
r5 ы (В. Com., Bombay, 1970) 


Hint. First find simple frequencies 
CV .=56% 


@ (а) What do you understand by the term Dispersion ? Explain 
briefly the various measures of dispersion pointing out their applicability in 


business. , n 
(b) Two workers on the same job show the foliowing results over a long 


period of time : 


Worker A Worker B 
Вап time of completing the job 
odds (minutes) 30 25 
Standard deviation 
(minutes) 6 4 


(i) Which worker appears to be more consistent ? Explain. 
(п) Which worker appears to be faster in completing the job ? 
Explain. 

(М.В.А., Delhi, 1972) 


(i) B is more consistents С.У. 16 compared to А 20) 
(8 A is faster as he takes less time, J 
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61. The ages of twenty husbands and wives are given below. Form a two- 
way frequency table showing the relationship between the ages of husbands and 
wives with class intervals 20—25 ; 25—30; etc. 


Calculate the Arithmetic mean and standard deviation of the two groups 
after the classification : 


5. No. eof Age of S. No. Age of Age of 
€ wife husband wife 
1 28 23 11 27 24 
2 37 30 12 39 34 
3 42 40 13 23 20 
4 25 26 14 33 31 
5 29 25 15 36 29 
6 47 41 16 32 35 
7 37 35 17 22 23 
8 35 25 18 29 27 
9 23 21 19 38 34 
10 4l 38 20 48 47 


(C.A., May, 1972) 
33:35, 07725] 


62. Calculate the Mear Deviation and the Standard Deviation of the 
following data : 


Age (in yrs.) No. of persons Аве (in yrs.) No. of persons 


0—10 15 40—50 25 
10-20 15 50—60 10 
20—30 23 60—70 5 
30—40 22 10—80 


10 
(В. Com.]Nagpur, 1971) 
63. From the following data calculate mean and standard deviation : 
Age Group No. of Employees Age Group Мо. of Employees 


Below 20 20 40—45 109 
20—25 26 45—50 84 
25—30 44 50—55 66 
30—35 60 55 and above 10 
35—40 101 


What inference will you draw from the above ? (B. Com., Delhi, 1971) 
[X—3955, 0 =9'55] 
64. A purchasing agent obtained samples of incandescent lamps from two 


suppliers. He had the samples tested in his own laboratory for length of life, with 
the follow ing results : 


Samples From 
Length of Life in Hours Co. A Co. B 
760 and under 900 10 3 
900 ,,  ,, 1,100 16 42 
1,100 ,, ,, 1,300 26 12 
1,300 „ „ 1,500 8 3 
which company's lamps are more uniform ? (M.B.A., Delhi, 1971) 


СУ; 
1 Со. А 1,1067 : lamps of Со. В ) 
L Co. В 1,0500 : are more uniform. 


MEASURES OF VARIATION R-53 
65. The following are some of the particulars of the distribution of weights 


of boys and girls in a class: 
Boys Girls 
Number 100 50 
Mean weight 60 kg 45 kg 
Variance 9 4 


(i) Find the standard deviation of combined data. 
(ii) Which of the two distributions is more variable ? 


f 
| 


66. Lives of two models of refrigerators in a recent sutvey аге: 


Life No. of Refrigerators 
(No. of years) Model A Model B 
0—2 5 2 
2—4 16 T 
4—6 13 12 
6—8 7 19 
8—10 5 9 
10—12 4 1 


What is the average life of each model of these refrigerators ? Which 
model has more uniformity ? (M.B.A., Delhi, 1973) 
[Model А. C.V.=54'7% Y 

| Model B : C.V.=36-2% | 

(Model B : has greater uniformity, J 
67. (a) What is Lorenz Curve? How is it constructed? What is 


its use ? Mar: Ў 
(b) The frequency distribution of marks obtained in (i) Mathematics (М) 


and (ii) English (E) are as follows : 


-value о, No. of Students No. of Students 

ы Marks Scoring in (M) Scoring in (E) 
5 10 1 
15 12 2 
25 13 26 
35 14 50 
45 22 59 
55 27 40 
65 20 10 
15 12 8 
85 1t 3 
95 9 1 


i diagram, and 

data by drawing the Lorenz Curves on the same di ү 
describe the main features you ob:etve. (B.A., Kerala, 1970) 
68. (a) Distinguish between absolute and relative measures of dispersion, 


(b) From the data given below, state which series is more variable : 


Analyse the 


Variable Series A Series B 

10 18 
PA 18 22 
30—40 32 40 
eren 40 32 
50—60 22 18 
60—70 18 w 

(C.A.. Nov.. 1971) 


f Series A: C.V.=327% | 
Series B : C.V.=37'2% $ 
B is more variable. e 
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69. Calculate the arithmetic mean, median and standard deviation for the 
following distribution : 
т Height No. of men 
(inches) 
60 less than 63 4 
63 , » 66 14 
66 , ,, 69 39 
Le ae cn 12 33 
TRE УЗ 8 
75 „ uy B 


(M.B.4 ., Delhi, 1972) 
(3 —68:323, Med. 68:14, 02:79) 


3 2:0: Find out the arithmetic mean and standard deviation from the follow- 
ing : 


(B.Com., Pass , Delhi, 1972) 
[X 72105, «—4:875] 
71. Two brands of tyres are tested with the following results : 


Life thousands Brand X Brand Y 

of miles 
20—25 1 0 
25—27:5 7 4 
27:5-.30 15 20 
30—31 10 32 
31—32 15 30 
32—33 17 12 
33—34 13 2 
34—35 9 0 
35—37:5 8 0 
37:5—40 2 0 
40—45 3 0 
100 100 


(a) Draw a histogram for each frequency distribution. 
(6) Which brand of tyre would you use on your fleet of trucks, and why ? 


(c) If the law forbids truck tyres to be used for more than 30,000 Miles, 
how does that change your answer, if at all ? 
(B. Com. Delhi, 1971) 
(a) Adjust the frequencies taking class interval as one 


(b) Find: coefficient of Variation— Brand X=30% Brand Ү=12%. 
Brand Y tyres shculd be preferred. 


(c) Brand X should be preferred to y. 


72. Calculate the standard deviation for the following data of the age djs- . 
tribution of 160 persons and comment on it ! 


Age Number of persons 
(Last birth day) 
20—24 8 
25—29 20 
30—34 30 
35—39 60 
40— 44 30 
45— 49 12 


‘[o=6'25) 
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73. Which measure of dispersion would be most useful for : 
(i) a social worker 
(ii) an actuary 
(iii) a public relation man for industry 
(iv) a spokesman for organised labour ? 
(В.А. Hons. Econ., Delhi, 1971) 


74. With the median as base, calculate mean deviation and compare the 
variability of the two series A and B. 


Series A Series B 
3484 487 
4572 508 
4124 40 
3682 382 
5624 408 
4388 266 
3680 186 
4308 218 


(С.А. May. 1973) 
[Coeff. of M. D. Series 4—0:116 
Series B=0'307) 
75. Compute the arithmetic mean, the median and standard deviation for 
the following frequency distribution giving life (in hours) of 100 electric bulbs, 


Life (in hours) No. of bulbs 

800—1000 6 
1009—1200 10 
1200—1400 24 
1400—1600 30 

1600—1800 20 

1800—2000 6 
2000—2200 4 

1 = 1,464 ; Med.=1,466'67 ; с= 27911] (B. Com. Poona, 1973) 


. . 76. The following are the scores of two batsmen A and B in a series of 
innings : 
A 12 115 6 73 7 19 119 36 84 29 
B 47 12 76 42 4 51 37 84 43 0 
Who is the better run getter ? 
Who is more consistent ? 
(B. Com.,Agra, 1973) 
Г С.У.\ 
Batsman А 50 837 | 
DNI 33 70:8 
А is better run getrer 
L B is more consistent. 


bet: 7]. Calculate standard deviation from the following data of the age 
distribution of 100 persons : 


Age (last birthday) No, of persons 
25—34 4 
35—44 20 
45—54 3% 
55—64 24 
65—74 10 
75—84 4 


(B. Com., Poona, 1973) 
o=11°3) 
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78. Calculate quartile deviation and standard deviation from the following 
data: ` 


Expenditure on food No. of families of 

(Rs.) factory employees 
555—575 2 
575—595 4 
59:5—61-5 9 
61:5—63'5 30 
63:5—65:5 23 
655-675 20 
67 5—69:5 9 
695—715 2 
715—735 1 

. (B.A. Hons. Econ,, Delhi , 1973) 


(с=2'96) 
79. Ап original frequency table with mean 11 and variance 9 9 was lost 
but the following table derived from it was found. Construct the original table. 
Vlaue -2 —1 n 1 2 
"5 1 6 7 4 2 


li=9, A-11; — 105 to —25, —2:510 65, 6'S to 15:5] (M. A. Econ. Meerut, 1972) 


80. For a group containing 100 observations, the Y—8 and с= 4/10 5. 
For 50 observations selelected from these 100 observations the mean and standard 
deviation are 10 and 2 respectively. Find the arithmetic mean and standard 
deviation of the other half, LY4-6, 94:3] (B. A. Hons. Econ. Delhi, 1973) 


8l. A study of the age of 100 film stars grouped in intervals of 10—12, 
12—14...... etc. revealed the mean age and standard deviation to be 32:02 and 
13:18 respectively. While checking it was discovered that the age 57 was misread 
as 27. Calculate the corrected mean age and standard deviation, 

[Correct X —32:32, Correet а =13°40] (М.А. Econ.;Punjab, 1973) 

82. The number of observations, means and standard deviations of two 
distributions are + 


Number of obsevations 280 350 
Mean (m) . 45 54 
S. D. (c) 6 4 


Find the mean and standard deviation of the distribution formed by the 


two distributions taken together. ( Y'19—50, 0:376 TI] (B.A. Hons, Feon.,Delhi, 1970) 


К 83. Find the standard deviation and coéfficient of variation of the distribu- 
tion tabulated : 


Interval Frequency Interval Frequency 
300—325 6 4:00—4:25 47 
3:25 -350 19 425—4'50 29 
3:50—3:75 35 А К 15 
3775—4:00 44 5 


е (В.А. Hons. Econ., Delhi, 1972) 
84. In the following data, two class frequencies are missing 


С.І. Frequency 
100—110 4 
110—120 : 7 
121—130 15 
130—140 X 
140—150 
150—140 40 
160 -170 16 
170—180 10 
180—190 6 


190—200 3 


[ e ——— 
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However, it was possible to ascertain that the total number of freauencies 
was 150 and that the median has been correctly found out as 146/25. You are 
required to find with tbe help of information given : 


(i) The two missing frequencies 
(ii) Having found the missing frequencies calculate Arithmetic Mean and 
Standard Deviation 
(iii) Without using the direct formula find ће value of mode. jn 
Li) 24, 25, (и) X —14733, 0192, (iii) 14409] (C.4. May, 1973) 


85. Calculate the standard deviations and their coefficients in the following 
two series and on the basis of the results ohtained compare the variability of 
one series with the other : 


Series Size of items 
4 11 18 25 32 
B 21 28 35 42 49 


(B. Com., Kurukshetra, 1975) 


(Series A ; TU ; Series B—0 283 ; Series A is more variable) 


4 86. From the following figures determine the percentage of cases which 
lie outside the mean at distances Y--1e, X-+26, X 1-30. 
115 117 121 125 116 120 118 117 119 166 
122 124 123 118 120 18 126 127 122 123 
(B.A., Bombay, 1975) 
(ems lying outside лас 
EOS CERO OE Do een 
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CHAPTER 9 
SKEWNESS, MOMENTS AND KURTOSIS 


1. (a) What do you understand by skewness ? How will you measure 


skewness ? (B. Com., Delhi, 1968 ; В.А. Hons., Econ., Delhi, 1969 ; 
B.Com., Rajasthan, 1974) 


(b) What do you understand by skewness? Draw the sketch of a 
skewed frequency distribution and show the approximate positions of the mean, 
median and mode. Give reasons why they will have the indicated position. 

(B.A. Hons. Econ., Delhi, 1974) 


2. (a) What is skewness ? How does it differ from dispersion 1? Describe 
the various measures of skewness. 
(B. Com., Nagpur, 1968 ; B. Com., Mysore, 1969) 


(b) Explain the term *skewness' as applied to a frequency distribution and 
describe the various measures of skewness known to you. (В. Com., Bombay, 1972) 


3. Explain the importance of measures of _‘skewness’ and ‘dispersion’, 
comment on the various measures of skewness. „Which measure is generally pre- 
ferred and why ? 4B. Com., Madras, 1969) 


4. (a) What are the tests of skewness ? 
(b) Distinguish between Pearson’s and Bowley's measure of skewness. 


5. Explain the terms coefficient of variation, skewness and kurtosis as 
applied to frequency distribution. (M. Com., Delhij 1969) 


6. Averages, measures of dispersion and skewness are complementary to 


one another in understanding a frequency distribution. Elucidate, 
(B. Com., Bombay, 1968) 
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7. Define Pearson's measure of skewness. What is the difference be- 
tween a relative measure and a corresponding absolute measure of skewness ? 
(В.А. Hons., Econ., Delhi, 1970) 


8, State the empiri al relationship between mean, ‘median and mode for 
unimodal frequency curves which аге moderately asymmetrical. 


Show by means of sketches the relative positions of mean, median and 
mode for frequency curves which are skewed to the right and left respectively. 
(B Com , Delhi, 1969) 


9. Define ‘Moments’. How can you find out skewness and kurtosis of a 
distribution from moments about the mean ? (M. Com., Delhi, 1965) 


1 10. Hcw are the central moments used in finding the normality cf a dis- 
tribution ? (M. Com., Delhi, 1967) 


ll. What is kurtosis? How the measures of kurtosis help in understand- 
ing a frequency distribution ? 


. 12. Explain clearly how the moments help in determining the shape ofa 
particular frequency distribution. (M. Com., Delhi. 1970) 


13. (a) Define Pearson's measure of skeweness. What is the difference 
between a relative measure and the corresponding absolute measure of skewness ? 
B.A. Hons., Econ., Delhi, 1971) 


(b) Explain clearly the terms skewness and ‘kurtosis’, (В. Com., Delhi, 1976) ` 


(c) Point out the difference between dispersion and skewness. 
(B. Com , Raj , 1973) 


Р (d) Distinguish between 'Skewness' and ‘Kurtosis’ and bring out their 
importance in describing frequency distributions. (B, Com., Delhi, 1975) 


14. Explain the term ‘skewness’ as applied to a frequency distribution. 
Calculate the measure of skewness for the following ; 


x 0 1 2 3 4 5 6 7 
y 12 27 29 19 8 4 1 0 
(B. Com., Madras, 1967) 
{SK =0] 
15. Find out the coefficient of dispersion and coefficint of skewness from 
the following table giving wages of 230 Persons ; 


Wages in Rs. No. of persons Wages in Rs. No. of persons 


70—80 12 110—120 50 
50—90 18 120—130 45 
90—100 35 130—140 20 
100—110 42 140—150 


8 
(М.А. Econ., Punjab, 1970) 
с 
—— 20156, SK-—93 
Б 6, S. 0 | 


‚ 16. Locate the mode and calculate mean a iati 
following distribution and usin; оа 


distribution : B your results comment оп the skewness of the 
Scores Frequency Scores Frequency 
1015 2 35—40 6 
15—20 8 nas $ 
20223 6 45—50 3 
25—30 12 50—55 1 
30—35 7 55—60 1 


ау (М. Com , Delhi) 1966 
LX 7301, Mo=27-7, 6—10:43, SK 0223) 
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à 17. Calculate Karl Pearson's coefficient of skewness from the following 
ata: 


Wages . No. of workers Wages No. of workers 
(Rs.) (Rs.) 
10—15 8 30—35 . 62 
15—20 16 35—40 32 
20—25 30 40—45 15 
25—30 45 45—50 6 
(С.А,, 1966) 
[SK--—0:222] 
18. (a) Compute coefficient of skewness from the following information, 
M-—18:8 inches 
21-M6 „ 
03=252 ,, 
à [SK —0:207] 
(b) You are given the following information : 
SK-0:8, Mean —40, Mode=36. 
Find the value of standard deviation. 
(o5) 


19. From the data given below, calculate Karl Pearson's coefficient cf 
skewness : 


Marks No. of students 
Less than 10 5 
DCN A 12 
s w w 30 32 
5o АО 44 
#9 os) ЖЮ 50 


(B. Com.. Muro 1865 
[SK= #0086 у 


h 20. Examine for skewness the following frequency distribution of blood 
lonors ; 


Age Frequency Age Frequency 
(in years) (in years) 
10—19 3,620 50—59 2,710 
20—29 7.045 60— 69 1,200 
30—39 9.030 70—79 175 
40—49 5,025 80—89 11 
(B. Com., Madras, 1969) 
[SR=+0'14} 


21. The following data represent the percentage of ash content in a parti- 
cular variety of coal as determined by tést on 280 wagon loads : 


Percentage of Frequency Percentage of ALLEY 


ash content ash content 
Less than 6'0 0 10*0— 10:9 84 
60—6'9 1 110—119 45 
70—79 17 12-0 —12:9 28 
80—89 28 13:0—13:9 7 
9:0—9 9 7 140—149 2 


rtile coefficient of skewness. 
Calculate the quartile c Ur Col BONES 
[SK=0:05} 


R-60 STATISTICAL METHODS 


22. Calculate Prof. Bowley's measure of skewness from the following 


data : 
Commission No. of Commission No. of 
payments Salesmen Payments Salesmen 
(in Rs.) (in Rs.) 
10—115 4 135—140 90 
115—120 10 140—145 52 
120—125 26 145—150 33 
125—120 49 150—155 17 
130—135 72 155—160 1 
(C.4., 1968) 
[SK=—0'019} 


23. From the following table, compute the quartile deviation as well as 
coefficient of skewness : 


Size Frequency Size Frequency 

4—8 6 24-28 12 

8-12 10 28—32 10 
12—16 18 32—36 6 
16—20 3) 36—40 2 


(С.А, May, 1965) 
[Q.D.—5 21, SK=0'188} 


24. Particulars relating to the wage distributions of two manufacturing 
firms are given below : 


Firm A Firm B 
Rs. Rs. 
Mean 75 80 
Median 72 70 
Mode 67 62 
Quartiles 62 and 78 65 and 85 
Standard Deviation 13 17 


Compare the features of the two distributions. 
(M. Com., Delhi, V968) 
b Firm A Firm B 
1 C Y.-17:3 CV.—2125 | 
L SK-0615 SK=1'06 J 
. 25. The following information was obtained from records of a factory 
relating to the wages : 


Arithmetic mean =Rs, 56:8 
Median =Rs, 59:5 
Standard deviation —Rs, 12:4 
Give as much information as you can about the distribution of wages. 
(C.A., 1969) 
[C.V.=21°8%, SK = —0:653] 
26. The following series give the heiztt of trees in a garden. From these 
data calculate Karl Person's coefficient of skewness, 


Height (ft.) Frequencies Height (ft) Frequencies 
Below 7 26 Below 35 216 
PAL 57 » 42 287 
» 21 92 » 49 341 
EC 134 ds 5156. 360 


(B. Com.,Lucknow, 1959) 
(SK=—0°256] 


SKEWNESS, MOMENTS AND KURTOSIS R-61 


27. From the following data calculate the coefficient of skewness pased 
upon the median and the quartiles and comment on your result : 


Age im years Мо, оў persons ^ Ageinyems No. of persons 


Less than 10 20 Less than 60 305 
5» ihe 5,220 65 EM Tg 361 
»  » 30 110, »  » 80 380 
soot v9 040) 185 2515,90 396 
eem 230 йлы 100 399 


(B. Com., Bombay, 1968) 
[Hint : Use formula 3X — Мей.) ; S.K,=0°05) 


28. From the data given below calculate Karl Pearson's coefficient of 
skewness and explain its significance : 


Wages 
(Rs.) 70—80 80—90 90—100 100—110 110—120 120—130 130—140 140—150 
No. of 
P ersons 12 18 35 42 50 45 20 8 
(М.А. Econ., Punjab, 1972) 
(SK =—0-33) 


. 29. Calculate the second moment about the mean and the coefficient of 
variation of the following distribution : 


x ТА x Д 
0 1 5 52 
1 9 6 29 
2 26 7 7 
3 19 8 1 
4 72 


(M. Com., Delhi, 1968) 
[us 1:979, C.V.=35'5] 


30. Calculate the Bowley's measure of skewness from the following : 


x f 
10—15 2 
15—20 5 
20—25 7 
25—30 13 
30—35 21 
35—40 16 
40—45 8 
45—50 3 


(С.А. Nov., 1969) 
`` [SK — —0 055] 


31. Ina certain distribution the following results were obtained : 
Mean=45, Median —48 
Coefficient of skewness=—0 4 


The person who gave you the data failed to give the value of the standard 


deviation and you are required to estimate it with the help of the available infor- 
mation, (С.А. Мау, 1968) 
[5-225] 


32. (a) In a frequency distribution the eoefficient of skewness based upon 
the quartiles is 0:6. If the sum of the upper and the lower quartiles is 100 and 
the median is 38, find the value of the upper quartile. (B. Com., Bombay, 1969) 

[Q3 =70] 


R-62 STATISTICAL METHODS 
(5) A frequency distribution gives the following results : Ў 
(I) Coefficient of variation—5 
(ii) Standard deviation =2 
(iii) Karl Pearson's coefficient of skewness—0'5 
Find the mean and mode of the distribution, (B. Com., Bombay, 1967) 
[X 40, Mo. 39] 


Я (с) Find the coefficient of variation of frequency distribution, given that 
MS Mean 1s 120, mode is 123 and Karl Pearson’s Coefficient of Skewness is —0:3. 

i (B. Com., Bombay, 1967) 

(C.V.—8'82] 


(d) For a distribution Bowley's Coefficient of Skewness is —0:36, Q=8 6 
and median is 12:3. What is its quartile coefficient of dispersion ? 
(B. Com., Punjab, 1968) 
(Coeff. of dispersion 0:24) 
33. From the following table, compute the coefficient of skewness : 


Marks Frequency 


(No. of Students) ` 
0—5 4 
5—10 6 
10—15 10 
15—20 16 
20—25 12 
25—30 8 
30—35 s 
(М *Com., Allahabad, 1969) 


[S.K.—0] 


34. Following table gives the distribution of population in towns 4 and B 
in different age-groups : 


Age groups Жее) in thousands 

В 
0—10 18 10 
10—20 i- 16 12 
20—30 15 24 
30—40 12 32 

`40—50 «810 29. 

50—60 5 11 
60—70 2 3 
70 and above 1 1 
a 


Compare the skewness of the series. 


(B. Com., Punjab, 1969) 
( eg. Town A—0:68 
UKn яс 
+35. The first three moments of a distribution about the value 3 of the 


variable are 2, 10 and 30 respectively. Obtain the first th 
Sbow also that the variance of the distribution is 6. gU jnre. 
.C.W.A., 


[91 55, ¥3=31, уз=201 
(Variance or u,—u,' — (u,^)2 J 
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36. The first four moments ofa distribution about the val =j 
17, —30 and 108. Find the moments about the mean and the ES pex 


[02=14"75, ua 39°75, џ4=142:31 
{> =2°5, v9—21, ¥3=160, v4—1,132 


37. Тһе first four moments of a distribution about x—4 are 1, 4, 10. 45, 
show that the mean is 5 and calculate the moments (7) about x=0, Gi) about 
x=5, 
f (Moments about mean) p; —3, из=0, щ=26; 
L (Moments about zero) v,—28 у =170, y4—1,101 


38. Obtain the measure of skewness from the following data and write a 
note on the results obtained : 
Mean=0:2944 ; Median --—04018 ; 01=—1-4568 ; Qs—12316 ; 


o=2°6408 ; 81— 3:854 ; 
where В; represents the third central moment. (LC.W.A., 1969) 
‘SK=0°791 (Karl Pearson's method) 
E: —'212 (Bowley's method) ] 
її=2'09 


39. The first four moments of a distribution are 1, 4, 10 and 46 respec- 


tively. 

Compute the first four central moments and tbe Beta constant, Comment 
upon the nature of the distribution, (M. Com., Delhi, 1967) 
(4170, Ha=3, из=0, 427 ; B1=0, 8=3) 


40. The first four moments ofa distribution about the value 5 of the 


variable are 2, 20, 40, and 50. Calculate mean, variance, f; апа Bs and comment 
upon the nature of the distribution. = (M. Сот.. Delhi, 1970) 
(Х=7, 316, Bi=—0 25, 8=0:64) 


41. (a) Define moments, How will you изе moments for testing the 
symmetry and normality of a distribution ? (M. Com,, Delhi, 1971) 
(b) State clearly how the measures of skewness and kurtosis can be used 

in describing a frequency distribution. (M.B.A., Delhi, 1971) 
f a distribution is --0*40. Its 


42. Karl Pearson's coefficient of skewness o| 
d the modeand median of the 


standard deviation is 8and the mean is 30. Fin 
distribution. (B. Com., Delhi, 1971) 
[Меа,= 28:9] 
43. Calculate Karl Pearson's Coeflicient of skewness from the following 
data : 
Marks No. of Students 
*above 0 150 
» 10 140 
» 20 100 
» 30 780 
» 40 80 
» $0 € 70 
» 60 30 
va 70 А 14 
22/80 0 


(B. Com., Delhi, 1970 ; M.A. Econ. Jabalpur, 1974) 
(Соў. of Sk=—0-754} 


R-64 STATISTICAL METHODS 
44. (a) Explain the meaning of negative and positive skewness. 


(b) Calculate the quartile coefficient of skewness of the following frequency 
distribution : 


Weight No. of Persons Weight No. of Persons 
(Ibs.) (lbs.) 

Under 100 1 150—159 65 
100—109 14 160—169 31 
110—119 66 170—179 12 
120—129 122 180—189 5 
130—139 145 190—199 2 
140--149 121 200 and-over 2 


(B.A. Hons. Econ. Delhi, 1972) 

45. Fillin the blanks : 
(i) If 89—3 the curve is called. 
(ii) If 853 the curve is called.. 5 
(iii) If 853 the curve is called........ СОХ 


(B.A., Bombay, 1970) 


46. From the following data calculate the coefficient of skewness based 
on the median and the quartiles: 


Wages in Rs. No. of Workers 
. 0—10 2 
10—20 38 
20—30 46 
30—40 35 
40—50 20 
7 [SK - 0005] (B. Com., Bombay, 1971) 


47. Fora moderately skew data, the arithmetic mean is 100, the coefficient 
of variation is 35 and the Karl Pearson's coefficient of skewness is 02. Find the 
mode and the median. 


(B. Com. Bombay, 1970) 
(Med.—97, Мойе=97] 


48. The first three moments of a distribution about the value 2 of a 
variable are 1, 16 and —40, Find the mean, the variance and v3. Show that the 
first three moments about zero are 3, 24 and 76. (В.А. Bombay, 1970) 


49. A frequency distribution gives the following results : 
(i) Coefficient of variation—5 
(ii) Standard deviation=2 
(iii) Karl Pearson's coefficient of skewness—0:5 
Find the mean and mode of the distribution. (B. Com., Bombay, 1972) 
(X—40, Mode=39) 
50. Calculate the values of 2; and B; from the following data : 
X  10— 12— 14— 16— 18— 20— 22— 24— 26— 
уд 3 30 110 218 275 22 108 32 2 
(8:0 ;85—27) 


SKEWNESS, MOMENTS AND KURTOSIS R-65 
51. You are given the following information : 


X f x f 
40—41 1 49—50 29 
4\— 2 50— 19 
42— 4 51— 14 
43— T 52— 6 
44— 12 53— 4 
45— 20 54— ' 2 
46— 38 
47— 52 
48— 40 
Calculate the first four moments about 4T5. Convert these results into 
moments about the mean and calculate B1 and (s. (B. Com. Bombay, 1970) 
Va 75, 854, „= 00 
(nies E. ) 
6:00, 82=3'34 


52. Following data are given to an economist for the purpose of economic 
analysis. The data refer to the length of Life of a sample of Good Year Tyres. 


Do you think that the distribution is Platykurtic ? 
1-100 5/й„3=2925°8 
2fd,—50 Zfd, 58665072 


zd, 319672 


53. Comment on the nature of the following distributions in respect of 
dispersion and skewness : 

Distribution І 14, 14, 14, 14, 14 
Makila Du уара 17 
ш 1, 3, 6 18, 4 


» 


(B. Com.]Raj, 1973) 


. 54. Calculate the Pearsonian measure of skewness for the following 
distribution : 


Size in 

(inches) 30—33 33—36 36—39 39—42 42-45 45—48 

No. of. 

Observations 2 4 26 47 15 6 
[SK=—0°027] P (B.A. Hons. Econ., Delhi, 1974) 


55, Find out the mean wages and Karl Pearson's coefficient of skewness 


from the following data : 
35 men get at the rate of Rs. 4'5 per man 


0 „ » » 3:5 s 

48 „ » » 65 » \ 
10 „ » » T5, 
1257-5 » UE X NS 

87 s» » » 9'5 pun 

43 „ » vj, CODE үзө, 

2» eis элима: у» 


_ (B. Com., Rajasthan, 1974) 
(X 2806, Coeff. of S.K. —'245) 
56. It is known that the mean and median of a distribution are 3'0 and 4:0 


respectively. Is the distribution skewed ? (B A. Hons. Econ. Delhi, uu 


r 


SMRE—10°77-5 


R- 66 STATISTICAL METHOD 


57. Calculate the first four moments of the following distribution about 
х=4 and thence find the moments about the mean of the distribution. Find also 
the values of B; and 8,. 


: 0 1 2 3 4 5 6 7 8 9 10 
f ee) 10 30 70 10 200 140 70 30 10 5 
(u5—2:7465, и, =26°41, 8, —0, 8,3) 


58 The following table gives the height of a batch of 100 students. 
Comment on the kurtosis of the distribution : 


Height in 
inches 59 61 63 65 67 69 71 73 75 
No. of 
Students 0 2 8 20 40 20 8 2 0 
(C.W А.,1974) 
[8. —3 161 


59. (a) A sample can be described almost completely by the first four 
Moments and two measures based on the moments. Examine Critically. 

(M. Com., Delhi, 1975) 

(b) How are moments used in describing a frequency distribution ? 

Explain with suitable examples. (M.Com., Delhi, 1975) 


;, 60. Fora distribution the mean is 10, variance is 16, y, is +1 and B, is 4. 
Obtain the first four moments about the origin, i.e. zero. Comment upon the 
nature of the distribution, (M. Com, Delhi, 1975) 


SECTION 10 
CORRELATION ANALYSIS 


l. (a) Explain what is meant by the correlation between two variables 
and comment On its interpretation. Bring out the usefulness of the concept by 
suitable examplés, (B. Com., Delhi, 1972) 


(6) Define Karl Pearson's Coefficient of correlation. What is it intended to 
measure ? How would you interpret the sign and magnitude Of a correlation 
Coefficient ? (В. Com., Bombay, 1972) 

2. (a) Explain the meaning and significance of tbe concept of correlation, 

(B. Com., Mysore, 1966) 
j (b) Discuss the role of the Concept of correlation coefficient between two 
variables in any empirical analysis. (B A. Hons Econ., Delhi, 1967) 


3. (a) What is meant by correlation ? Does correlation always signify a 
Cause and effect relationship between the variables ? (B. Com., Madras, 1964) 


(b) Even a high degree of correlation does not mean that a re 


lationshi 
Of cause and effect exists between the two correlated variables, Discuss, p 


{a) Positive correlation, and 

(b) Perfect correlation ? Г 
What would you infer if ®ху turns out to be zero, 
5. Distinguish giving Suitable examples between - 
(i) Positive and negative Correlation. 

(ЇЇ) Linear and non-linear correlation, 

(iii) Simple, partial and multiple correlation, 


CORRELATIOH ANALYSIS R-67 


6. Iftwo variates are independent their correlation coefficient is zero. 
Is the converse true? Explain by means of an example. — (I.C.W.4A., July, 1967) 


7. Define ‘Coefficient of Concurrent Deviations’ and comment on its 
usefulness. (B. Com., Madras, 1968) 


8. Whatiscorrelation ? Explain how you will use the following methods 
in determining correlation : 

(a) Graph. 

(b) Correlation Table. 

(c) Karl Pearson's coefficient of correlation. 


9. Define the coefficient of correlation. What is it intended to measure ? 
How would you interpret the sign and magnitude of a calculated r ? Consider in 
particular the values r=, and —1. (B.A. Hons., Econ., Delhi, 1969 ; 


B. Com , Bombay, 1970) 


10. (a) What is a ‘scatter diagram’ ? How does it help us ia studying the 
correlation between two variables, in respect of both its nature and extent ? 


(M. Com., Delhi, 1967 ; В. Com.,Poono, 1973) 
(b) If two variables are independent their correlation coefficient is zero. 


Is the converse true ? Explain by means of an example. 
(B. Com., Bombay, 1971 


(c) Draw a scatter diagram to represent the following data : 


x 15 18 30 27 25 23 30 
f 7 10 1757016 12 13 9 


fficient of correlation between x and y for the above 
(d) Calculate the coefficien Cin, Ponga, 1960) 


data, 
[r— 4-0 632] 
М i "s rank correlation coefficient? Bring out its use- 
nea Ху il (B. Com., Poona, 1967) 


12. Define Covariance. How is it related to the correlation coefficient ? 
Why does it always lie between —1 and +1. (B.A. Hons., Econ., Delhi, 1969) 


3, Explain the various properties of correlation coefficient. 
: заа (LC W.A., 1967 ; Karnatak, 1968) 


(b) Prove that coefficient of correlation is independent of change of 
scale and origin. (L.C.W.4., 1964) 


(c) Prove that coefficient of correlation always lies between 1. 
(M. Com., Delhi, 1970 ; М.А. Econ., Delhi, 1968) 


(d) What are the limits of the value of ‘r’. What do positive, negative 
and zero values of ‘r’ indicate ? (B. Com., Mysore, 1970) 


14. What is rank correlation ? Wow does the coefficient of rank correlation 


iables differ from Karl Pearson’s coefficient of correlation ? 
Боасе (B. Com., Bombay, 1969) 


is meant by correlation? Give the general rules for interpret- , 
ing its e. Д (В. Com., Madurai, 1967) 


R-68 STATISTICAL METHODS 


16. The following data give the height in inches (х) апі the weight in Ib. 
(») of random sample of 10 students from a large group of age 17 years : 


x 61 68 68 64 65 70 63 62 64 67 
па Рел 123 130 115 по 125 100 113 116 126 


(a) Represent the data by a scatter diagram and judge the nature of 
relationship and extent of correlation between x and y. 
(b) Calculate the product moment coefficient of correlation 
[r—0:769] 


17. 7 Following are the heights and weights of 10 students of B. Com, 
class : 


Height (inches) 528 E 5 65 7% 6 63 60 72 


Weight (kg.) 50 65 63 50 54 60 61 55 54 65 
Drawa scatter diagram and indicate whether the correlation is positive 
or negative. (B. Com., Delhi, 1968) 


18. Calculate the coefficient of Correlation between income and weight 
from the foliowing data. What conclusion do you draw from the estimate ? 


Income (Rs.) 100 200 300 400 500 600 
Weight (Ib.) 120 130 140 150 160 170 } 
(r=+1 ; Nonsense Correlation) 


19. Ten students secured the following marks in Statistics and Accoun- 
tancy. Find the coefficient of correlation and interpret it. 


Students Marks in / Marks in Students _ Marksin Marks in 


Accountancy Statistics Accountancy — Statistics 

1 78 84 6 82 62 
2 36 51 7 90 86 
3 98 9t 8 62 58 
4 25 60 9 65 53 
5 75 68 10 39 47 

(B. Com., Lucknow, 1969) 

[rz 4-0 78] 


20. Find Kerl Pearson's coefficient of correlation between Y and Y 
from the following data : 


X series 17 18 19 19 20 20 21 21 22 23 
семе «12^ 916: 544 Н 15 1. 01959-22 IG :15.24 20) 


(B. Com,, Osmania, 1966) 
` [r= +0'62] 


“Independence of X and Y implies zero correlation but not vice versa,” 
Comment. 


21. Find Karl Pearson's coefficient of correlation from the following 


data: 
Wages Cost o f Living Wages Cost of Living 
00 98 99 92 
a ' 99 97 95 
^ 10: 99 98 94 
102 97 96 90 
100 95 95 91 


(B. Com., Osmania, 1966) 
(r=+0'847] 


CORRELATION ANALYSIS R-69 
22. The following are the results of H.S. School Examination : 


Age'of candidates Percentage of failure Ag2 of candidates Percentage of failure 


13 39 18 39 
14 40 19 48 
15 43 20 47 
16 34 21 54 
17 36 


Calculate the value of r and its probable error. (B. Com., Andhra, 1967) 
[r— --0686, P. E,—0119] 


23. Relation between height and weight of a batch of students is given 
below in the following table : 


Weight (Ib.) 100 .105 104 107 111 115 125 130 132 135 
Height (inches) AB, wH9:-50. 51: 521504354 USS) Абу 150 
Calculate coefficient of correlation. (B. Com., Andhra, 1966) 

[r2 4-0:977] 


. 24. Compute Karl Pearson's coefficient of correlation in the following 
series relating to price and supply of a commodity : 


Price Supply Price Supply 
(Rs.) (Kg.) (Rs.) (Ку.) 
11 30 16 24 
12 29 17 24 
13 29 18 21 
14 25 19 18 
15 24 20 15 
(B. Com., Andhra, 1966) 


[r— —07962] 


25. Ifthe coefficient of correlation between the annual value of exports 
during the last ten years and the annual number of children born during the same 
period is +0°8, what inference, if any, would you draw? (M. Com., Agra, 1967) 


26. The following marks were obtained by 12 students in Mathematics 
and Statistics : 


Students Mathematics Statistics - Students Mathematics — Státistieg 


A 50 22 G 61 32 

B 54 25 H 65 30 

c 56 34 1 66 28 

D 59 28 J 7A 34 

E 60 26 K 71 36 

Е 62 30 L 74 - 40 
ГА 


Find the correlation coefficient between the performance of the students in 
Mathematics and Statistics, as determined by these data. (B. Com., Delhi, 1967) 
(r=+0°783] 


R-70 STATISTICAL METHODS 


27. From the following data compute the coefficient of correlation between 
Xand Y: 


Ү 
Series Series 
No. of items 15 15 
Arithmetic mean 25 18 
Square of deviations from arithmetic 
mean 136 138 


Summation of products of deviations 
of X and Y series from their 
respective arithmetic means 122 


(B. Com., Delhi, 1968) 
(B. Com., Pass Delhi, 1973) 


(r==0°391) 


28. The following table gives the results of i i inati 
held m 1976 : giv s of matriculation examination 


Age of candidates Percentage of failures Age of candidates Percentage of failures 


(years) (years) 
13-14 392 18-19 39:2 
14-15 406 19-20 48-9 
lel 5 ж m 
- ү 1-2 4: 
1718 366 E 5 


Calculate Karl Pearson's coefficient of correlation and its probable error. 
From your result, càn vou definitely assert that the failure is correlated with age ? 


(М.А. Econ., Lucknow) 
(r= +0°682, P.E,—0 12) 


29. The following table gives the marks obtained by a group of 12 
students in two examinations А and B. Calculate the correlation coefficient 
betwezn the marks obtained in the two examinations and interpret the result. 


Students — Marks in Marksin Students Marks in Marks in 
Examination Examination Examination Examinaiion 

A B A B 
1 15 18 D 20 18 
2 13 16 8 16 15 
3 17 18 9 18 21 
4 14 15 10 17 17 
5 18 19 11 19 18 
6 12 16 12 21 20 

(В.А. Hons. Econ., Delhi, 1°66) 

(r=+0°703) 


30. From the following table compute the coefficient of correlation 
ы Savings Bank deposits and strikes and lockouts over a period of 
years ; 
Saving Deposits 
(Rs. in lakhs) 51 54 56 59 65 60 70 
Strikes and lockouts 38." 433 — 13627 7:33 9m23391^13 
(B. Com , Madras, 1966) 
(г= —6*785) 


CORRELATION ANALYSIS ў R-71 


.. 3l Calculate Karl Pearson's coefficient of correlation between cost of 
- living and wages from the following data ; 


Year Index ofcost Index of wages Year Index of cost Index of wages 


of living of living 
1961-62 * 100 100 1966-67 96 121 
1962-63 105 107 1967-68 107 125 
1963-64 104 115 1968-69 112 128 
1964.65 106 115 1969-70 118 133 
1965-96 99 115 1970-71 123 ` 135 
(B. Com., Lucknow, 1972) 


(r=+0°745) 
32. Compute coefficient of correlation from the following data : 
City Pópulation Accident Rate City Population Accident Rate 


(in thousands) (per million) (in thousands) — (per million) 
A 10 32 E 50 40 
B 20 20 E 60 28 
© 30 24 G 70 48 
D 40 36 H 80 44 
(I C.W.A., July, 1966) 
(t E0714) 


33. The following table gives the savings bank deposits in billions of 
dollars and strikes and lockouts in thousands over a number of years. Compute 
the correlation coefficient and comment on the result. 

Savings deposits 51 54 5:5 5'9 65 6'0 КУЗ 
Strikes апа lockouts 38 44 33 36 33 23 10 
(LC.W.A., Jan , 1966) 
С r=—0°822 
L Nonsense Correlation } 


34. The following table gives the birth rates and death rates of some 
Countries, Calculate the coefficient of correlation between birth rate and death 
rate, 

Country A B c D E F G H I 1 
Birth Кае 22 12 10 16 15 8 9 10 8 20 
Death Rate 14 6 7 12 10 '6 8 7 6 9 


(B. Com., Marathwada, 1969) 
[re 0:84] 


35. Calculate the correlation coefficient for the following data concerning 
marks in Statistics and Accountancy of 12 students : 


Statistics 52 74 93 55 41 23 92 64 40 т dE 
Accountancy 45 80 63 60 35 40 70 58 43 64 51 75 
(B. Com., Karnatak, 1968] 

[r— 4-0'739] 


36, Calculate coefficient of correlation (Pearson) from the following data : 
x 22 35 13 19 33 58 31 22 29 
d 20 21 34 32 24 33 48 29 25 29 

(B. Com., Kerala, 1969 

[r= 4-0:953] 


R-72 STATISTICAL METHODS 


, 37. Calculate Karl Pearson’s coefficient of correlation between agricultural - 
and industrial production from the following data : 


Year Index No.of Index No.of Year Index No.of Index No. of 


agricultural industrial agricultural industrial 
Production production + production production 
1961-62 98 112 1966-67 124 151 
1962-63 102 113 1967-68 115 153 
1963-64 114 117 1968-69 132 157 
1964-65 117 129 1969-70 127 175 
1965-66. 117 139 1970-71 135 195 
(B. Com., Nagpur, 1972) 
[r— 4-0:88] 


38. Calculate Pearson's coefficient of correlation from the bivariate sample 
of 50 distributed as below : 


| | 
NC X |30—35 35—40 40—45/45—50 50—55|55 — 6C. 


IN: 


110—120 
120—130 


(M. Com, Delhi, 1968) 
[г=-+-0°431] 
39. The following table presents 100 couples, classified according to the 
ages A the parties at the time of marrying, Is there any correlation in their 
ages 
Husband's Age 
І 
/ 20-23 —30 | —40 | 24s | 


Am 3 2 


4128 | 6 4 


Wife's Age 
Op— | se— | 0t— |z- ves 


What light these figures throw on the Marriage customs of the people ? 
{r=+0°613] 


-CORRELATION ANALYSIS R-73 


: 40. Calculate coefficient of correlation between theages of husbands and 
wives from the following data : 


Аве of Age of wifes 

fides 18-20" tens “оку Meas” зз о Total 
20—25 9 5 — — — — 14 
25—30 — 18 10 — — — 28 
30—35 7 15 11 — — 33 
35 -40 — — 1 15 4 — 30 
40—45 un P 8 10 2 20 
45—50 - — -— 3 7 6 16 
50—55 — "n — — 4 5 9 
Total 9 30 36 37 25 13 150 


(B. Comi, Nagpur, 1969) 
{r=+0 722] 


4l. Calculate coefficient of correlation’ by concurrent deviation method 
from the following data : 


Year Supply Price Year Supply Price 
1954 150 200 1958 160 190 
1955 154 180 1959 165 180 
1956 160 170 1960 180 172 
1957 172 160 


(B. Com., Meerut, 1970) 


ге=—1) 


. 42. A computer while calculating the correlation coefficient between two 
variates x and y from 25 pairs of observations obtained the following constants : 


N=25, EXY=516, 2X125, 2¥=100, ZX*—650, ZY31-480. 
It was, however, detected later on at the time of checking that he had 


copied down two pairs -> M for XB Obtain the correct value 
816 H 8 
of the correlation coefficient. (B.Sc., Madras, 1969) 


(re 07413) 


43. Find Karl Pearson’s coefficient of correlation from the following 
series of marks secured by 10 students in a class-test in Mathematics and 


Statistics : 
Marks in 


Mathematics 45 . 30 765 .30 :90 40 250 75 '85,.60 
Marks in 
Statistics 35 90 70 40 95 40 60 80 80. 50 


Also calculate the probable error. Assume 60 and 65 аз working means 


respectively. 
(B. Com., Lucknow, 1966 ; B. Com., Delhi, 1970) 


: (r= 4-0:903) 
44. Calculate Karl Pearson’s coefficient оѓ correlation between X and Y 
by the short-cut method in the following series : 


X 's0 50 '55 «o 65 65 65 $60 60 50 
E 132 14 —16- 16 35 19414 #38088: 


Use 60 and 15 ti the assumed means. 
And Sites Dente ye tie en (B. Com., Lucknow, 1967) 


(r=+0°787] 


R-74 STATISTICAL METHODS 


45. The average prices of stocks and bonds listed on the Calcutta Stock 
Exchange during 1950—59 are given below : 


Years Stock Prices Bonds Prices 
Rs. Rs. 
1959 3522 102:43 
1951 39°87 100 93 
1952 41-85 97:43 
1953 4333 97:81 
1954 40 06 98:32 
1955 53:29 100:07 
1956 54°14 97°08 
1957 49°12 91°59 
1958 40 71 94-85 
1959 5514 94'65 


Find the correlation coefficient between stock and bond prices, 
(B. Com., Lucknow, 1968) 


46. (a) Calculate Karl Pearson's coefficient of correlation and its probable 
errors from the following data of imports and exports : 


Years Value of Value of 
Imports Exports 
1956—57 903 620 
1957—58 1,036 625 
1958 —59 904 573 
1959—60 961 640 
1960—61 1,122 642 
1961—62 1.092 661 
1962—63 1131 685 
1963—64 1,190 " 793 
1961—65 1,314 816 
1965—66 1,350 805 
(B. Com., Lucknow, 1967) 
r2 +0917 
Р.Е,=0 034 J: 


46. (Б) Calculate Karl Pearson's coefficient of correlation between index 
A and index B as given below: 


Index A : 169 182 182 192 198- 209 7227. "238, "250. " 253 
„ B:200 22 225 228 229 233 249 , 266 255 255 


(M.A. Econ., Meeruth, 1974 ; r=+0'944). 
47. Compute the coefficient of correlation from the following table : 


Candidate No, 1 2 3 4 5 6 7 8 9 10 
Marks in 

Mathematics 30 29 28 28 27 26 26 25 24 24 
Marks in 

Statistics 23 22 22 20 21 20 17 18 17 14 
Candidate No. Tia 4227 3 di I 
Marks in 

Mathematics 23-23 w22 21 19 


Marks in Statistics 17 14 15 16 44 


(M. Com., Allahabad, 1970) 
(r— 4-0:891) 
48. Calculate the coefficient of correlation for the ages of husband and 


wife: 
Age of husband mE A ДЫ ДУ ANE HE У С E 48 4 1309 
Age of wife 18 22723. 24. 25 26 28: 29 30 32 


(I. C.W. A., 1970) 
(r—4-0:995) 
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49. Find if there is any correlation. between money in circulation and 
cost of living from the following indices supposing that the money in circulation 
affects the cost of living the next year : 


Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 
Index of 
Money in 

circulatiod 110 120 125 128 132 140 145 148 150 152 154 
Index of 
cost of 

living 105 108 110 112 108 130 130 132 134 135 136 

(M.A., Econ., Punjab, 1965) 

(r= 4-092) 


_ £0. Apply concurrent deviation method to determine the value of co- 
efficient of correlation from the data given below : 


x y. X Ju 
100 120 120 160 
110 140 122 130 
1¢0 160 125 110 


(r——0:775, N=5) 


51. From the following data of the indices of supply and price for 
12 years find out the coefficient of correlation of short-term oscillations assuming 
а three-yearly moving average. [Ignore decimals.] 


Yearly Indices of Indices of Year Indices of І ndices of. 
Supply Price Supply Price 
1964 90 101 1970 105 135 
1965 96 105 1971 108 133 
1966 99 115 1972 118 142 
1967 11 117 1973 127 152 
1968 114 123 1974 120 156 
1969 102 123 1975 110 160 
(r==0°452} 


52. Calculate the coefficient of rank correlation from the following data : 


x 48 33 40 9 16 16 65 24 16 е7, 
Fim :033 уй: B 40024, oa Aaa 2820.11.92. 2-9 v 19, 
(М.А. Econ., Agra, 1967] 
[гь = 70:733) 
53. From the following data calculate the coefficient of rank correlation 
between x and y. 
6 20 65 42 33 44 50 15 60 
» 30 * 70 25 58 75 60 45 80 38 


(B, Сот., Bombay, 1968) 
[re -- 0 927] 


54 (a) "Correlation analysis between different time series must be taken 

with a grain of salt." Comment. 
Ё i ient of —0:5 does not mean that 50% of the data 
are ei eh oats oe » (В.А. Hons. Econ. Delhi, 1970) 
[(b) Hint—Findr?. Only 25% data are explained) 


i ion coefficient of r-0'Gindicates а relationship 
twice керы pun mere (M.A, Econ., Delhi, 1966) 


., 55. (a) The coefficient of ran 
it imply perfect correlation between t 
answer; 


k correlation of a bivariate data is 1. Does 
he variables ? Give reasons for your 
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(b) If the product moment coefficient of correlation is 0, does it mean 
that the variates are independent ? Is the converse true ? 
(B. Com., Bomtay, 1970) 


56. Calculate from the following data, the value of 'r' between total 
cultivable area and area under wheat : 
TOTAL CULTIVABLE AREA (IN BIGHAS) 


Area under 
wheat \bighas) 0—500 500—1,000 1,000—1,500 1,500—2,000 2,000—2,500 
0—200 12 6 um с E 
200— 400 4 18 4 2 2 
400 600 4 7 H Eos 
600 — 800 3 -= 2 2 
800—1,000 = 2 2 


(B. Com., Mysore, 1970) 
(r= 40:681) 


57. Three judges rank the 10 entries in a beauty context as follows : 


A B с р Е Е G H 1 ve 

x 8 7 5 4 9 10 6 2 1 3 
Y 7 8 9 3 10. 6 5 4 2 1 
PA 10 9 8 ү! 6 5 3 4 1 2 
n 


Which pair of judges has the nearest approach to common tastes i 


beauty ? 
Hint.— Find rank correlation between the judgment of first ) 


and second, second and third, first and third pair | 


Ї of judges. | 
4 rk(I & 11) —0721 t 
1 "КП & ПТ) =0'697 | 
^ ' rk(I & Ш) = 0:552 1 
L IT and ТЇЇ pair has the nearest approach. J 


58. Calculate Kari Pearson’s coefficient of correlation between the value 
of x and y from the following data : 
x 100 110 115 116 120 125 130 135 
У 18 18 17 16 16 15 13 10 
(В. Com., Andhra, 1970) 
[r- —0:915) 
59. (a) From the scatter diagram decide which of the two pairs of vari- 
ables show the greater correlation. Explain your answer. 


(Б) Prove that : 
The coefficient of correlation always lies between -+1 


(B.A. Hons. Econ., Delhi, 1971) 
[\a) First 
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60. In an aptitude test two judges rank the 10 compititors in the following 
order : 
Individual t 2 3 gunk > 6 7 8 9 10 
Ranking by Judge F 6 4 3 1 1 7 9 8 10 5 
Raking by Judge ll ^ 4 1 6 TARS, 8 10 9 3 2 
Is there any concordance between the two judges ? 
(B. Com. „Bombay, 1970) 
[re = +0182] 
61. 92 students were examined in accountany and statistics. They wete 
classified according to the marks obtained by them as is shown in the foliowing 
E To what extent.is the knowledge of the students in the two subjects 
related ? 


Marks | Accountany 
Statistics | 20— 25— 30— 35-40 


15— 20 2 — 

AT DA SiG – 
35— -— SAL 2 
45— _ 5 1 1 
| 55— — — = 1 


| 
25— | 


(B. Com., Nagpur, 1971) 
(r=+0'622] 
62. Following are the ranks obtained by 10 students in two subjects— 
Statistics and mathematics : 
Statistics 1:325) SIRES, EE TEA Gilley Bite Be 19:540. » 
Mathematics 133 4— 1:5 25 О УЗЫШ ШАО GLB 


To what extent the knowledge of students in two subjects is related ? 
(B. Com., Bombay, 1971) 
[rg 0: 758] 
G3. Two Judges gave the following ranks to a series of eight one-act plays 
in a drama competition, Examine the relationship between their judgments : 


Judge A 8 7 6 3 2 1 5 4 
JudgeB 7 5 4 1 3 2 6 SENI 
(B. Com., Bombay, 1972) 
[7.0 619] 
64. Calculate the coefficient of correlation from the following data : 


Marks in statistics 20 3045 28 — 17 19 23 35 | 13. 16 18 


Marks in | 8 RHET I T Re Cy des За тате „29. bot) 
: И, ; (B. Com., Delhi, 1969) 
[r— 107836 


65. The following table gives the index number Saal оро аце оп іп 
а cou-try and the number of unemployed persons in dne sue Ep ment on 
eight consequitive years, Calculate the coefficient of correlatio 


your result, z No. of registered 
Year Index of шщ ДЕЕ 
p (in thousands) 

1954 100 de 

1955 10" 1370 

1956 103 11:5 

1957 105 120 

1958 106 * 12-5 

1959 104 156 

1950 103 208 

1951 98 (B. Сот.. Bembay, 1971) 


[r=—0'523] 
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66. Calculate coefficient of correlation between the ages of husbands and 
wives from the following data and find its probable error. * s 


Age cf Age of husbands 

wives 20—30 30—40 40—50 50-60 60—70 Total 
15 25 5 9 3 zn iE 17 
25—35 = 10 25 2 — 37 
35—45 =- 1 12 2 — 15 
45—55 Ae ur 4 16 5 25 
55-65 — -— — 4 2 Gia 

7 100 


(Ur 


Total 20 24 24 

" (B. Com., Nagpur, 1971) 
[r=0'795, P.E,=0°025] 

67, Тһе following table gives a bivariate frequency distribution of 50 clerks 


according to age in years and pay in rupees : 


\ 


Рау 
Age 250—300 300—350 350—400 400—450 
A ab pee Aer M TL ee, 
20—30 8 3 ш. Lu 
30—40 2 5 2 2 
40—50 — 2 9 6 
50 - 60 — — 5 6 


Giv@your comments on the value of the correlation coefficient. 
(B. Com., Bombay, 1972) 
(r=+0'764) 
68. Find the coefficient of correlation from the following data : 
X 300 30 400 450 soo 550 600 650 700 
Y 800. 900 1000 1100 1200 1300 1400 1500 1600 
(B. Com., Pass, Delhi, 1973) 
[r2 1) 
69. A study is made relating aptitude test scores to productivity in a 
factory after three months training of personnel. The following are the figures 
regarding six randomly selected workers : 


Aptitude Score X 9 18 18 20 20 23 
Productivity Index Y 23 33 23 42 29 32 


А Find the coefficient of correlation between aptitude score and productivity 
index. 1r20557] ` (B. Com., Meerut, 1972) 


70. Calculate the coefficient of rank correlation of the following data of 
marks of eight students in Accountancy and statistics. 


Student No. 1 2 3 4 5 6 7 8 
Marks in Statistics 52 63 45 36 72 65 45 25 
, 9, Accountancy 62 53 51 25 19 43 60 33 

‚ [r=0°693)} (B. Com., Bombay, 1910) 


,71. Given below are the sales realized by two salesmen A and B, in 10 
different districts 


District No. 1 2 3 4 5 6 7; 8 9 10 
Sales by A (in "000 Rs.) js 40 65 50 15 80 75 60 60 30 
Sales by В (in 000 Rs.) 30 40 75 60 10 85 80 60 70 35 
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Calculate Karl Pearson’s Coefficient of correlation and interpret it. 
(B. Com. Poona, 1973) 
: [r— 4-0984] 
1 72; Following are the percentage figures of expenditures incurred on cloth- 
ing and entertainment by an average working class family in a period of 10 years,: 


Year 96 Expenditure on 96 Expenditure on 
clothing Entertainment 
(X) (Y) 

1961 21 п 
1962 27 8 
1963 31 5 

А 1964 32 3 
1965 20 13 
1966 25 10 
1967 33 2 
1968 30 7 
1969 28 9 
1970 22 2 


Compute (2) Pearson's coefficient of correlation between X and Y. (ii) 
Spearman’s rank Correlation coefficient and Comment on your results, 


[>= —07592, r,— —0:6] (B. Com., Poona, 1973) 


73. Compute the correlation coefficient and the lines of. regression for the 
following data ; 


ADVERTISING EXPENDITURE (Rs.* 000) 
Sales revenue (Rs."000) 5—15  15—25 25—35 35—45 Total 


75—125 Li 1 — — 5 

125—175 7 6 2 T 16 е 
175—225 1 3 4c atnlong 10: 
225—275 1 1 3 4 9 

Total 13 1 9 Ыз 40 


[М.В.А. (P.T.) Delhi, 1973] 
74. From the data given below find the number of items : 


r=0°5, Zxy—120, Zx2—90 
Standard deviation of y series=8 


(where x and y denote deviations from arithmetic average) 
(M. Com., Delhi, 1972) 
74. (a) Explain clearly the meaning of correlation between two variables. 


(b) The coefficient of correlation (r) between consumption expenditure (c), 
and disposable income (у) in a study was found to be 4-0 8, What Percentage of 
variations in c are explained bv variation in y ? (В.А. Hons. Econ.,Delhi, 1973) 


75. Calculate the coefficient of correlation between the ages of husbauds 
and wives from the following table ; 


Age of wives (in ys.) 
Age of husbands 10—20 20—30 30—40 40—50 50—60 


(n ys.) 

15—25 5 4 = a z^ 
25—35 4 15 10 — — 
35—45 — 10 16 6 — 
45—55 — — 6 10 5 
55—65 23 = — I" 4 


(M. Com., Meerut, 1973) 
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76. Calculate Karl Pearson's coefficient of correlation from the following 
data : 
Y-Series X-Series 
10—13 6—9 2—5 Total 
20—24 1 — 3 
15—19 1 3 — 4 
10—14 = 2 2 4 
5—9 i 1 f 2 
0—4 -= 1 1 2 
(M.A. Econ., Jabalpur, 1975) 
77. Caiculate the coefficient of correlation between the values of X and 
Y from the data given below : : 
x 78 89 96 69 59 79 68 61 
Y 125 137 156 112 107 136 123 108 
Take 69 аз working mean for X and 112 that for Y. 
[7 =096] (B. Com., Punjab, 1974) 


78. The following table gives the result of matriculation examination held 
їп 1974. Calculate Karl Pearson's coefficient of correlation and its probable 
error. Do you think age is related with percentage of failures ? 


Age of students Percentage Age of students Percentage 
(years) of failure (years) of failures 
13—14 39 18—19 39 
14—15 40 19—20 48 
15—16 43 20—21 44 
16—17 ° 43 21—22 56 
17—18 36 


(B. Com., Rajasthan. 1974) 


[Hint ; Take mid-points of age] 
(r=+0°658, Р.Е, 0:126] 
79. (a) If r between the annual values of exports during the last 10 years 
and the annual number of children born during the same period is +0°8, what 
inference, if any, would you draw E 


(5) Calculate К. P's Coefficient of correlation from the following 
data : 


Husband’sage 24 2171 28 28 291. 30 32 33 35/15/35: 7940 
Wife's age (8 749 0277-25 292. 28 28 30.5. ЗӨ ee 


(B. Com., Kuritkshetra, 1975) 


{r=0'505] 
80. (a) The judgments of three judges in a beauty competition are given 
below : 
X 1 2 4 6 1 5 3 
Y. 2 1 4 5 6 ii 3 
л 2 3 ex 4 f 1 6 
Which pair of judges has the nearest approach to common tastes in 
beauty ? (B A. Hons. Econ., Kurukshetra, 1975) 


IR, g y 0857, Ry g 20071, Ry g 0:429 


(b) How do you interpret each ot the followi ie 
TUI RA Nae р ch ot the following values of г (coeffi 
+1, —1, —09 and 4-081 


(M. Com., Delhi, 1975) 
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$1. In two sets of variables X and Y with 50 observations each, the 
following data were observed : 


10,. S.D. of Х=3 
Y-6, S.D.of Y=2 
Coefficient of correlation between X and Y is 03. 


However, on subsequent verification it was found that one value of x(—10) 
and one value of Y(—6) were inaccurate and hence weeded out. With the remaining 


49 pairs of values, how is the original value of r affected ? (M. Com , Delhi, 1975) 
[r=0'3] 


SECTION 11 
REGRESSION ANALYSIS 


1. (a) What do you understand by linear regression ? 
(B. Com., Delhi, 1969) 


(b) What is regression 7 Why are there, in general, two regression lines ? 
When do they coincide ? Explain the use of regression equations in an economic 
enquiry. (М.А. Econ:, Punjab, 1970) 


1 2. Compare and contrast the roles of correlation and regression in study- 
ing the interdependence of two variates. (M. A. Econ, Delhi, 1969) 


3. Explain the concept of regression. Why should there be in general 


two lines of regression for each bivariate frequency distribution ? 
(В.А Hons. Econ., Delhi, 1974) 


4. (a) Explain the concept of ‘regression’ and comment on its utility. 
(M. Com , Delhi, 1968) 


(b) Show that the coefficient of correlation between two variables cannot 
exceed unity. (B.A. Hons. Econ., Delhi, 1974) 


5. Define the terms ‘regression’, ‘linear regression’ and *curvilinear 


regression’. 
Describe method of least squares and show how it can be used to fit a 
linear regression. How is linearity of regression tested ? 
(M.A. Econ , Delhi, 1964) 


6. Distinguish clearly between ‘correlation’ and ‘regression’ as concepts 
used in statistical analysi . (M. Com},Delhi, 19:0) 


' 
7. What are thelines of regression ? Can the relation between two vari- 


ables given by a set of data be always studied by drawing those Jines for the data ? 
(М.А. , Econ., Delhi, 1966) 


8. If the regression lines of yon x and оп y are given respectively by 
y=aytaix=bo+b,y prove that a15;—r?. (B.A. Madias, 1967) 


(a) Show that correlation coefficient is the geometric mean between 
regression coefficients. If the sign ofa regression coefficient is known how would 
you find the sign of the coefficient of correlation ? 1f one of the regression 
coefficients is negative, what type of variation would you expect in the original 
series of pairs of observations ? (1.C W.A., 1966) 


(b) What is the coefficients of correlation ? Explain how it provides a suit- 
ble measure of association between two variables. In what way does the concept 
of correlation differ from that of regression ? (M.A. Econ,Punjab, 1972) 


sion and ratio of variation and state 
(M.A. Econ., Meeruth, 1975) 


9. 


10. Explain the concepts of regres: 
their utility in the field of economic enquiries. 
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> 11. Obtain the lines of regression for the fo'lowing data : 
X 1 2 3 4 5 6 7 8 9 
T, 9 9 10 12 1i 13 14 16 15 
Obtain the estimate of y which should correspond on the average to 
Х=6'2. (B. Com., Madras, 1970) 
(r—0:95; Х=0'95, Y— —64 
( Y2095 X--725 ; when X —62, y-114] 
12. Find the regression equation from the following data : 


Age of husband 1817195 ДЕГН 224182294 245 1252526. .27 
Age of wife 17 I1 Be MS LL 19, 212 19,20 — 21 22 


(B. Com.}Madras, 1971) 
(Ве eq. Xon Y, X 21747 Y -10°52 ) 
Reg. eq. of Y on X, Y=0 527 X 4-704) 


x . 13. The following table shows the test scores made by salesmen on the 
intelligence test and their weekly sales : 


Salesmen 1 2 3 4 5 6 7 8 9 10 
Test scores 40 70 50 60 80- 50 90 40 60 60 
Sales 2:5 60 45 Eas ООО сл Өр 0 
(in thousand 

units) 


Calculate the regression line of sales of test score and estimate the most 
probable weekly sales volume if a salesman makes a score of 70. 
(B. Com., Madras, 1967) 
( Y206X 4-0:45 ; Most probable weekly sale volume for a given score) 
L of 70 is 4°65 thousands or 4,650 ) 
14. Estimate the values of X corresponding to Y=200 from the following 
data : 


x 250 284 297 338 463 393 


Y 137 147 184 196 276 260 
3 (B. Com.,Madras, 1964) 


200— 


15. From the data given below estimate the most likely height of a 
father whose son's height is 70 inches. 


Me Fathers; Mean height is 67 inches with standard deviation of 3:5 
ches, 


Sons : Mean height is 65 inches with a standard deviation of 2:5 inches. 
Coefficient of correlation between the heights of father and sons is +8. 
(M. Com., Agra, 1967) 
[72:6] 


s 16. From the following data, write down the equation of the regression 
ines ; 


Average Standard deviation 
Marks in Maths, 844 
Marks in English 35'6 10:5 
r=0°62 


Estimate the marks in Mathematics corresponding to 60 marks in English. 
(B. Com., Madras, 1968) 
1965! 
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. 17. The following data give th: correlation coefficient, means and standard 
deviations of rainfall andjyield of paddy in a certain tract : 


Yield per acre Annual rainfall 
in lb. in inches 
Mean 97355 183 
Standard Deviation 38.4 20 
r=0'58 


Estimate the most likely yield of paddy when the annual rainfall is 
22 inches, other factors beirg assumed to remain the seme. 


(1,014°7 1b.) 


18. Find the most likely price in Bombay (X) corresponding to the price 
of Rs. 70 at Calcutta (Y) from the following data : 


Average price at Calcutta=65 
Average price at Bombay=67 


S.D. at Calcutta -25 
S.D. at Bombay 22315 
Tzy=+0°8 


(M.A, Econ., Lucknow, 1967 ; B. Com., Mysose, 1969 ; 
M. Com.,Agra, 1970) 


(Хе 2Y —5:8X49 — Rs. 7276) 


19. Two random variables have the least square regression lines with 
equations 3x--2y- 26-0 and €x--y-31—0, Find the mean values and the 
correlation coefficient between x and y. 

(B.A. Hons.. Econ., Delhi, 1966 : М.А. Econ . Delhi, 1970) 
М.А. Econ., Meeruth, 1973 (X —4, Y 27, r= —0°5) 


20. The coefficient of correlation between the ages of husbands and wives 
їп a community was found to be --0:8, the average of husbands’ age was 25 years 
and that of wives’ age 22 years, Their standard deviations were, respectively, 
4and 5 уеагѕ. Find, with the help of regression equations, 


(i) the expected age of husband when wife's age is 12 years, and 
(ii) the expected age of wife when husband's age is 33 years, 
(M A., Rajasthan, 1967) 
[G) 18:6 ; (ii 30] 
21. The following statistical coefficients were deduced in the course of an 
examination of the relationship between yield of wheat and the amount of 


rainfall : 
Yield in Ib. Annual rainfall 
per acre in inches 
Mean 9850 12:8 
Standard Deviation 701 1'6 
Coefficient of correlation between 
yield and rainfall +0°52 


From the above data, calculate 

(i) the most likely yield of wheat per acre when the annual rainfall is 
9:2 inches, and 

ii) t Je annual rainfall for yield of 1,400 Ib. rer acre. 

iode Pro IBY 693 416; Y—0:012X4-0:98) (M. Com., Agra, 1969) 


iy Most likely yield when annua! rainfall is 9*2 inches 903 lb.) 
Г) Probable cate when the yield of wheat is 1,400 b=17°78 inches J 
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^ 22. In the following table 5 is weight of Potassium bromide which will 
dissolve in 100 gm. of water at T°C. Fit an equaticn of the form 5= mT- b by 
the method of least squares. Use this relation to estimate 5 when T—50*. 


T 0 20 40 60 80 
S 54 65 75 85 96 
(С.Ж. А., 1972) 
[S—0:52 7-542, 56=&0°2] 
23. Aninvestigation into the demand for television sets in 7 towns has 
resulted in the following data : 


Population 
(Thousands) (X) 11 14 14 17 17 21 25 
No. of TV sets 
demanded (Y) 15 27 27 30 34 38 46 
ae Fit a шн regression of Y on X, and estimate the demand for TV 
ets гог a to 
town with a populatidn of 30 thousand. (М. Com., Мееғић, 1974) 


IY22X—3 ; Ү»=57) 
24. Ina partially destroyed laboratory record of an analysis of correlation 
data, the following results only are legible : 


Variance of Х=9 
Regression equations 
8X —10 Y4-66—0 
40X —18Y-214 
What are : 


(i) the mean values of X and Y, 
(ii) the standard deviation of Y, and 
(iii) the coefficient of correlation between X and У? 
D (M. Com., Agra, 1972) 
(X=13, y —17, o,—4, r—0:6] 
25. There are two series of index numbers. Р for prices index and S for 
Stock commodities. The mean and standard deviation of P are 100 and 8 respec- 


tively and S are 103 and 4. d ч 
The correlation coefficient between the two series is 0:4. With these data 


work out a linear equation to read off values of P for various values of S. What 
are the assumptions involved in your procedure ? 
Can the same equation be used to read of values of S for different values 
of P? If not, state why not, and give the appropriate equation. 
^ (.A.S., 1972) 
(P—0'8 S--17:6 ; $—02 P+83) 


26. The ages of husbands and wives іп acommunity were found to have 
а correlation coefficient of +0°8,the average age of husbands was 25 years and 
average age of wives 22 years, standard deviations were 5 and 4 years respectively. 


Draw regression lines and from the lines measure : 
(i) the expected age of husband when wife's age is 22 years, and 
(ii) the age of wife when husband's age is 28 years, 


(B. Com., Mysore, 1966) 
( X=Y+3, Y-—0:64X--6 
Хз= 


25, Үза=23'92 
(а) coefficient of correlation between the ages of husbands and the ages of 


27. From the data given below find : 


wives, 
(b) the regression equations, and 


* 


REGRESSION ANALYSIS £ R-85 


(c) the most likely age of wife when husbard’s age is 25. 
Age of husbands (in years) 

22 23 23 24 26 27 27 28 30 30 
Ав? of wives (in years) 

18:7 7207052157120 07209 ОЗО ГОА ВА 1 26 


(B. Com., Bangalore, 1968) 
(а) r="95 
(b) Y="8X+1'2; хетта] 
(с) 212 years 


28. Yon are given the following results for the heights (X) and weight (Y) 
of 1,000 workers in a factory : 


X=68 inches, ¢,=2'5 inches 
Y=150 lb., oy=20 lb. 
rzy—0'6 


Estimate from th aboye data : 
(i) the height of a particular factory worker whose weight is 200 Ib, 


(ii) the weight of a particular factory worker who is 5 feet tall. 
(B. Com., Mysore, 1968 : 
(M. Com., Nagpur, 1971) 
[Y—4:8X— 1764 ; X—:075Y 4-56 75 
Y; 71116 pounds § Y44,—71:75 inches] 
29. Given the following data calculate the marks in Mathematics obtained 
by a student who has scored 60 marks in English : 


Arithmetic average of marks in Mathematics 


(all students) 80 
Arithmetic average of marks in English 
(all students) 50 
S.D. of marks in Mathematics 15 
S.D. of marks in Engiish 10 
Coefficient of correlation between marks in 
Mathematics and marks in English —04 
(B. Com., Bombay, 1968) 


(Yo 74) 


30. Explain the terms (a) coefficient of regression, and (Б) lines of 
regression. 


Obtain the equation of the regression line for husband's age (Y) on wife's 
азе (X), at marriage from the following data : 


Husband's age (436$. 123 OT. 28 28 29 30 31 33 35 
Wife's age (X) 29 18 20 220.927 21 29 27 29:228. 


Thence obtain the husband's age when wife's age is 16. 
(B.A., Kerala, 1969) 
f. .Y-0775X--11:25 
LY 22325 years 


31. The following data give the hardness (X) and tensile strength (Y)of 7 
pi of a metal in certain units, Find the linear regression equation of Y 
on X. 


X: 146 152 158 164 170 176 182 
Y: 75 78 7 79 82 85 86 
U.C. W.A., July, 1969) 
(Y—031X 429-46) 
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32. Given the standard deviations oz, су for two correlated variates х and 
y ina large sample : 


(a) What is the standard error in estimating y from x ifr=0? 
(b) By how much is the error reduced if ris increased to 0:5 ? 


(c) What is the standard error in estimating y from x if r=1? 
(M.A. Econ., Delhi, 1966) 
33. What would bz the lines of regression if (i) ris equal to +1. )ris 
equal to —1, and (iij) ris equal 10 zero ? Give your interpretation in each case. 
(M.A. Econ., Punjab, 1965) 
34. Explain the terms: correlation and regression, What is the relation 
between the coefficients of correlation and regression ? (B.Sc., Madras, 1970) 
35. Fora bivariate data, 
the mean value of Х=532 
»oon oo »XY-279 
» regression coefficient of Y on Х= —1°5 
»o n^n » » X on Y= —0°2 
Find (i) the most probable value of Y when X=60, and (ii) r, the co- 


efficient of correlation. (B. Com., Bombay, 1970) 
р [G) 1777, (ii) r2 —0:548] 


5 Б> Given the following data, calculate the expected value of Y when 


X Y 
Average T6 14'8 
S.D. 36 2:5 
r=0°99 
(B, Com., Mysore, 1970) 


(17:825) 
" 37. Calculate the coefficient of correlation and obtain the lines of regres- 
sion for the following data : 
x 1 2 3 4 s 6 7 
Y 9 8 10 12 п 13 14 
Obtain the estimate of Y which should correspond оп the average to 
X-62. р (M.B.A., Delhi, 1971) 


[Y= 929X--7:284 
Yo 2 71370438] 


um 38. Given the following values, find the expected value of X when Y 
512: 
Average of X series—25 
Average of Y series=22 
Standard deviation of X series=4 
Standard deviation of Y series—5 
Coefficient of correlation 0:8, 


(M. Com., Allahabad, 1973) 
[X—0:64Y--10:92 
Х,2=18:6] 


—*— 
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K 39. Following table gives the ages of husbands and wives for 50 newly 
married couples. Find the two regression lines. Also estimate 


(a) the age of husband when wife is 20, and 
(b) the age of wife when husband is 30. 


Age of husband 
Age of wife 20—25 25—30 30—35 Total 
16—20 9 14 — 23 
20—24 6 11 3 20 
24—28 — — T 7 
Total 15 25 10 50 


(1.C.W.A., 1970) 
Х=`123у+12:02; Y="47x+8'02 
Х,=26'48 ; Yao722:12 
X denotes husbands and Y denotes wife 


40, Find the most likely price in Madras corresponding toa price of 


Rs. 75 at Bangalore from the following data : 


Average price at Madras „ә: RS SS 

Average price at Bangalore Rs. 68 

Standard deviation of Madras Rs. 2'5 
Rs. 3°5 


Standard deviation of Bangalore 


The coefficient of correlation is +0°78 рїн г two prices of the com- 


modity in the two towns. М. Com., Allahabad, 1968) 


Уорд 
Y25—58'9. 


41. Obtain the two regression lines and estimate the blood pressure when 
the age is 50 years from the following data : 


Age Blood Pressure Age Blood Pressure 
(X) (Y) (2) (0) 
56 147 55 150 
42 125 49 145 
72 160 38 155 
36 118 42 140 
63 149 68 3 152 
47 128 60 155 


(М.А. Econ., Punjab, 1969% 
( X="614Y—35°88 | 
| ¥="768X+-103'48 | 
LY 50=141'88 J 
42. The average weekly wage for working class in Madras is Rs. 12 and 
for that in Delhi Rs. 18, their respective standard deviations are Rs. 2 and Rs. 3 
and the coefficient of correlation is 0:67. Find the most likely wage in Delhi 
corresponding to the wage of Rs. 20 in Madras. (M. Com., Allahabad, 1970) 
( Y21005 X4-5 94 
LY20=26'04. 


43. You аге supplied with the following data only : 
Variance of X—36 
12X —15Y--99-0 
60X—27Y=321. 
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Calculate 
(a) the average value of X and Y, 
(b) the standard deviation of Y, and 
(c) coefficient of correlation between X and Y. 
(M. Com, Allahabad, 1969 ; M. Com., Agra, 1970) 
[X—13, ү —17, r=0°6, o,-8] 


44. The profits (Y) of a company to the Xth years of its life were observed 
to be shown below : 


Years of life (X) 1 2 3 4 5 
Profits (Y) in lakhs of Rs. 1,250 1,400 1,650 1,950 2,300 
Examine whether the linear regression of Y on X is 
Y—1004-265X. 


Hence or otherwise calculate the standard error of estimate. 
(М.А. Econ., Aera, 1966y 


[ Ү=265Х 4-915] 


eH 45. Compute the least squares regression of Y on X from the following 
ata : 


X 89 86 74 65 64 63 66 67 72 79 
Y 9 91 Sp. 15 73 72 7 75 78 84 
(M.A. Econ., Punjab, 1969) 
[Y —0:808X +2092] 


46. Find the coefficient of correlation (r) between X and Y from the 
following data : 


x 3 6 5 4 4 6 7 5 
Y 3 2 3 =) 3 6 6 4 


Also find the line of regression of Y on X and predict the average value 

of Y when X is 9, (M. Com., Meerut, 1969) 
r=+40'433 1 

Y=0'5X+1'5 | 

LY,-6 J 


47. (a) From the following data, obtain the line of regression of Y on X 
and estimate the average value of Y when X—8, 16,24 : 


od 2 6 8 n 13 13 13 14 
Y 8 6 10 12 12 14 14 20 


(Б) Make additional necessary calculations, and find for the data in (a) 
above, the regression of X on Y. 


State clearly why there are usually two different lines of regression, Point 
out the case when there is only one line of regression. 


(с) For certain X and Y series which are correlated the lines of regression 
of Y on and of X on Y are respectively 6Y—5 X--90 and 15X—8Y--180. Also 
95-4, Find Y, y, oy and bye. (M. Com., Meerut, 1970) 


í (а) Y—08125X--3 875, Y4—10:375 — 
|. Y4-16875, Yog=23°375. | 
l(b) X—08125Y 4-025. ) 
Uc) ¥=30, Y=40, r=0°667, bey—07533 
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+ 48. There are two series of index numbers, D for disposable personal. 
income and S fora salary of the company. The Ce dd eie palle 
the D series are 120 and 15 respectively and of the 5 series 115 and 10. The co- 
efficient of correlation between the two series is 0°75. From the given informa- 
tion obtain a linear equation for estimating the values of S for different values of 
D. How will you interpret the values of S corresponding to different values of D 
Obtained from the equation? Can the same equation be used for estimating 
values of D for different values of S ? (M. Com.)Rajasthan, 1966) 
49. The follow; ks have been ob! ORI A 

У е following marks have obtai i 
Suits е9 n ained Буа class of students in 

Paper I -80 -45 2:535 56 — 58 — 60.. 65*— 68 .. Ho. 75 5 

Paper II 82 56 50 48 60 62 64 65 70 74 % 


Find the lines of regression and also calculate the correlation coefficient. 
(M.B.A.,Delhi, 1969) 
f Y=0'39X +09 
| X¥=0'85Y+9°5 
Lr—07917. 
50. (a) What are regression coefficients and how do they differ from the 


correlation coefficients ? 
(b) The regression equations calculated from a given set of observations 


peus x-—02y-42 
ye-0tx484 
Calculate : 


(i) "x and y (il) r (iii) the estimated value of y when x=4. 
(1.C,W.A., 1968) 


КӨ) x73 y76; (ii) r2 —04 ; (iii) 52] 
51. Given: 
Ex—56, Ху=40, Zx1—524 
Sy?=256. Exy=364, N=8. 
(i) Find the regression equation of x on y, and 
i i fficient. (1.C.W.A., 1967) 
(ii) The correlation соећсіеп аман и 


52. (a) Explain the meaning of regression of Y on X and X on Y. 
(b) Discuss the importance of regression analysis in business and 
economics. 
i lati: have been made for closing prices of 12 
(c) The following calculations ha ed made ON ERU wit the 


stocks (X) on the Bombay Stock Exchange о! i i 
die of sales in thousan¢s of shares (Y). From these calculations find the 


regressi tions 
CN SM ZX-—580, E Y=370, EX Y 11,494 


22.41,658, ХҮ2-=17.206. 
hg (В.А. Hons.. Econ., Delhi, 1971) 


(Y-53:55—0:47 
( Х=8224—11Ү 
53. Given the following data, what will be the possible yield when the 
rainfall is 29" : ы 
Rainfall Production 
Mean 25" 40 units per acre 
Standard Deviation 3" 6.5 »» 


Coefficient of correlation between rainfall and production=0°8. 
(М.А.Есоп.. Agra, 1971) 
[Ү=16Х ; 46:4 units per acre] 
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54. Find the line of regression of the sales on output from the following 
data and also calculate correlation coefficient : 


Years Sales Output 
(Units) (Units) 
1967 1,025 1,306 
1968 853 1,076 
1969 698 905 
1970 970 1,295 


(M.A. Econ., Punjab, 1971) 
(Y¥=0'744X+ 34 ; ғ='99) 


55. The following data relate to the heights of fathers and sons: 
Heights of 


Fathers (inches) X : 71 68 73 69 67 65 66 67 
Heights of 
Sons (inches) үз 69 72 70-210. 68 67 68 64 


Find the two regression equations and estimate the height of a son whose 
father’s height is 67:5 inches. Also find out the value of the coefficient of coirela- 
tion between the heights of fathers and sons. (M. Com., Nagpur, 1972) 

Х==`525Ү+3`229 
руе 20139% ] 
Y, 57682 inches 


D 56. Given below are the figures for supply and price cf a certain commo- 
dity for the last seven years : 


Year 1963 1964 1965 1966 . 1967 1968 1969 
Supply 80 84 86 88 92 96 97 
Price 12 11 15 15 18 16 18 
Obtain regression equation of price on supply and hence calculate the 
most likely price in 1970 when the supply is 110. (B. Com., Bombay, 1970) 


[Y—0367X—17:533 ; Y4,—22:84] 


57. Define regression and point out its usefulness in the field of ecoromic 
a sis. Find the regression coefficients of y cn x and x on y for the following 


х 3 2 —1 6 4 —2 5 
z 5 13 12 —l 2 20 0 
15, = —0 301, һу„=—2'036, r=—0°783] (B. A., Bombay, 1972) 


58. For two variables X and Y the regression equation of X on Y is 
X=:4Y—3 and the regression equation of Y on X is 9¥=X-+ 13. Find the mean 
of X and Y and the coefficient of correlation betwe:n X end Y. 


(В, Com. Bombay, 1972) 

(Ñ=; Y=12, r—-40 816] 

у 59. Compare and contrast the roles of correlati ion i dy- 
ing the independence of two variates. Wea day 


. For 10 observations on price (p) and supply (5) i ^ 
obtained in appropriate units : ШАШЫН лоок data. were 


2p —130, ES=220, Ep?=2288, 251-5506 
ZpS =3467 


Obtain the line of regression of S on p and estimate the supply whi ice i 
i € ie rice is 
16 units, ana find out the standard error of the Siem itii 


(М.А. Econ., Delhi, 1969 ; М. Com., Delhi, 1970) 
[$—1015 p4-8:805 ; when price 16 Supply would be 25] 
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. 60. The following table shows the mean and standard deviation of the 
prices of two shares on the Bombay Stock Exchange : ` 


Share Mean Standard Deviation 
A Co. Ltd, Rs. 395 Rs.108 
B Co, Ltd. Rs. 47:5 Rs. 16°8 


If the coefficient of correlation between the prices of the two shares is 0°42, 
find the most likely price of share A corresponding to a price of Rs. 56 observed 


in the case of share B. 
(M. Com., Delhi, 1971; M. Com.. Nagpur, 1972) 
X—268Y--2673 
он 
61. For a biavariate data, Y55—25449 
the mean value of х=53°2 
the mean value of y=27 9 
the regression coefficient of y on x= —1'5 
the regression coefficient of x on y= —02 
Find (i) the most probable value of y when 2:60, (ii) ғ, the coefficient of 
correlation. (Y-—r5X41077 ; Yeo= 177, r-=—0°547] (B. Com., Bombay, 1972) 
62. The following data are given for marksin mathematics and Hindi in 
the depactmental examination of U.P. in 1970 : 
Mean marks in Hindi 40 
Mean marks in Mathematics 47 
S.D. of marks ia Hindi 10°5 
S.D. of marks in Mathematics 17 
E of correlation between marks in Hindi and Mathematics 
+0" 


Form the two lines of regression and explain why there are two Tegression 
equations. Calculate the expected average marks in Mathematics of candidates 
who received 50 marks in Hindi. 4 (M. Com., Allahabad, 1970) у 

[==0:696Х+19'16 ; Ү,,=53'96 ; X—0266Y--27498] 


$ 63. Following is the distribution of studends according to their height and 
weight : 


Height in Weight in 
(inches) (ib.) 
92—100 100—110 110—120 120—130 
50—55 4 7 5 2 
55--60 6 10 T 4 
60—65 6 12 19 7 
65—70 3 8 6 3 


Calculate (i) the two coeficients of regression, and (ii) obtain the two 
regression equations. (М: Com., Delhi, 1973) 


64. (а) If x—0:85y and у=0'89х and 04—3, calculate oy and the coeffici- 

iR: (M.A. Econ., Punjab, 1973) 
ent of correlation. : 07—087, 0,307) 
‚ 0) zy: 00405, 5550152 ; Y20152X--9973, X=0'0405Y+55'93) _ 


(b) Find the two lines of regression (i) of x on y, and (ii) of y on x - 
from the following data. Also find the coefficient of correlation between x and у. 


xi 15 27 21 30 34 38 46 
y: 12 15 15 18 18 22 26 


Estimate the value of x corresponding to value 25 of y. 
T¥=2¥—S5 - Y20469X $3461 ; Хь=45] M. Com., Meerut, 1973) 
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65. From the following data determine: 
(i) the two regression equations, 
(ii) Find the value of X when Y is 25 and the value of Y when X 


is 20. 
(iii) Calculate the value of correlation coefficient : 
x 4 6 11 7 12 
Y 5 7 10 12 16 


X=0°649Y-+1°51 ; Y=1'04X +168) 
3357177735, Y20=22'48, r= 4-0:822 J 
66. You are given the following data : 
Xx 10 14 16 20 15 
Y 30 32 38 35 40 
(i) Calculate the two regression equations. 
(ii) Find X if Y is 50 and Y if X is 30. 
(iii) Calculate the value of correlation cocfficient. 
X='456Y—0°96 
Y —596X--26:06 ] 
Y4,—43:94, X,,—21:84 г=`521 
67. Calculate the two regression equations and find the value of correlat- 
ion coefficient from the following data : 
x 10 13 16 19 21 22 
Y 5 6 7 9 11 13 
(X—1463Y 4-437; 
Y—0:627X —2:065; 
г= 40:954) 
68. Calculate the two regression equations from the following infor- 
mation and find X when Y is 100 


x sd 
Mean 25 30 
Standard deviation 5 7 
r=+0°85 
X=0'607Y+6°79: 
[ verisxs 05 ] 
X49, 61:49 
69. Fit a regression line of the form Y —a--5X to the following data : 
x 1 2 3 4 5 6 
Ye 166 184 142 180 328 296 
t [Y-32X--104] (M. Com., Meerut, 1975) 
70. Calculate the regression coefficients for the data given below: 
x 8 6 4 7 5 
ү 9 8 5 6 2 


(В.А, Hons. Econ., Delhi, 1974) 

(bz = 04, byr=1°2) 

„71. Calculate the regression equation of Y on X and Хоп Y from the 
following data : 


Height of 
Father (inches) x : 65 63 67 64 68 62 70 66 68 67 69 71 
Height of 
son (inches) Y 68 66 68 65 69 66 68 65 71 67 68 70 


(X——3:384-1036Y ; Y—35:82--07476X) 
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72. The regression equation between p, the price of a commodity and d, 
the quantity demanded is d—12—0'4p. Plot the demand function on the graph 
paper. What is the elasticity of demand at p=10? 

[ (В.А. Нопз Econ., Delhi, 1975) 

73. (a) What is meant by simple linear regression model ? Under what 
assumptions are the parameters of the model estimated ? 

(b) The following random sample gives the number of hours of study (X) 
and grades (Y) in an examination for 12 students. 

bust 3 4 4 4 5 5 5 6 6 7 8 
Ү: 40 55 55 60 75 70 80 75 90 80 75 85 

Ж Fit a regression line to the data and test the significance of b at 5% leve) 

and 1% level. 7 [YeT5X4327:5] (M.Com., Delhi, 19751 


SECTION 12 
ASSOCIATION OF ATTRIBUTES 


1. What is meant by Association of Attributes ? How does it differ from 
correlation ? (B. Com., Delhi, 1969) 


2. Define ‘Association’ and ‘Disassociation’ between two attributes. 
(M. Com., Delhi, 1965) 


Т 3. Explain what do you understand by association of attributes? How is 
it measured ? (B.Sc., Madras, 1970) 

4: (a) Discuss the concepts of association between two attributes and 
correlation between two variables. 


(b) How will you examine the consistency of data classified according to 
different attributes ? Give a set of conditions of consistency in the case of two 


attributes, (M.A. Econ., Delhi, 1971) 
5. Whenare two attributes said to be 
(a) independent, 
(b) positively associated, and 
(c) negatively associated ? 


What are the conditions to be satisfied by class frequencies in each of the above 
cases ? (M.A. Econ., Delhi, 1970) 


6. Explain the terms *order of a class frequency' and *ultimate class 
frequency' in а contingency table. Demonstrate how a class frequency of any 
order can bs expressed in terms of ultimate class frequencies and Ciscuss condi- 
tions of consistency in a contingency table, (414.S., 1967) 

7. Explain the concept of (i) complete association, (ii) partial associa- 
tion, (iii) complete disassociation, and (iv) independence so as to make the diffe- 
rence between them clear. (M. Com., Delhi, 1968) 


8. (a) State the criteria for consistency of data collected. for studying 
association of two attributes. M. Com., Delhi, 1969) 


(b) What do you understand by association of attributes ? How will you 


examine the consistency of data classified according to different attribu'es ? 
(M.A. Econ., Meerut, 1973) 


9. You are given the following information : 
(4)—400, (AB) =250, (B) =500, N=1,200. 
Prepare the nine-square table and find the missing frequencies, 
[(AB) =150, («В)=250, (28)=550, (8) —700, (x) — 800) 
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10. Find the missing frequencies from the following data : 
(28)=500, (B)=600, (a) = 800, (B) 1,000. 
[(4B)=3C0, (Ав)=500, («8 300, (4) =800, (N) 71,600] 
11. Isthere any inconsistency in the data given below ? 
(a) N —12,000, (4)=600, (AB)=400, (B) —-5C0. 
(b) (4B) 200, N —1,000, (A)=150, (В)=300. Ка) No. (b) Yes) 


12. (i) Investigate the association between darkness of eye-colour in 
father and son from the following data : 


Fathers with dark eyes and sons with dark eyes =50 
» » НА en not dark eyes =79 
e „ not dark eyes and sons with dark eyes =89 
fe ,, not dark eyes and sons with not dark eyes— 782 


ii) What are the conditions for testing the consistency of observations of two 
К Mand B ? РУ E (В А. Hons. Есоп., Delhi, 1974) 
(iii) What would have been the frequency of father with dark eyes and sons 

with dark eyes for the same total number, had there been complete independence ? 
[G) 0=0`695, (ii) 18] 


]3. From the data given below, compare the association between literacy 
and unemployment in the ruraland urban areas and give reasons for the diffe- 
rence, if any : 


Rural Urban 
Total number of adult males 25 lakhs 200 lakhs 
Literate males 10:5 40 ,, 
Unemployed males 540 12755; 
Literate and unemployed males 3, г НН 


(M. Сот , Agra, 1973 ; М.А. Econ., Jabalpur, 1974) 
oar areas O=+0'472 
Rural areas Q=+0'357 
14. Out of 900 persons, 300 were literate and 400 had travelled beyond 
the limits of their district, 200 of the litera'es were among those who had 
travelled. Is there any relation betweea travelling and literacy ? [Q- 4-0:6] 


15. What do you understand by ‘contingency’? In an investigation relat- 
ing to health and nutrition of childeren between th: ages of one and five years two 
groups of children were compared, one belonging to the well-to-do class, 125 in 
number, and the other belonging to the poor class, 124 ia number. The following 
results were obtained : 


Poor children (per cent) Well-to-do children (per cent) 
Below normal weight 75 23 
Above 4, HEN 25, 42 
Find the coeffizient of association between the weight of children and 
their parents’ financial condition. [Q-— 4-693] 


16. Define the coefficient of association of two attributes. Obiain such a 
measure between unemployment and educational attainmenis from the following 
results of urban survey : 


Employed Unemployed 
Illiterate or below matric 5,997 432 
Matric and above 572 96 


(M. Com , Delhi ; M. Com., Agra, 1968) 
[О=-++0 399] 
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17, A census revealed the following figures of the blind aud the insane in 
two age groups in a certain population : 


Age group Age group 
15—25 years Over 25 years 
Total population 270,009 160,200 
Number of blind 1,000 2,000 
Number of insane 6,000 1,00 
Number of insane among 
the blind 19 9 


(a) Obtain a measure of the association between blindness and insanity in 
each of the two age groups. 
^ (b) Do you consider the blindness and insanity are associated or disasso- 
ciated with each other in the two groups; or more in one age group than in 
other ? (M Com., Agra, 1969) 
[s age group 15 – 25 years Q— —0:08 1 
» w » Over25 years Q——0:165 ) 
18. 2,000 candidates appeared for a competitive examination and of these 
600 were successful. 350 had attended а coaching class and of these 200 came 
out successful. Estimate the utility of the coaching class. 
(Q=0'613 ; coaching is useful) 
i 19. Out of 5 lakh literates in a particular dis'rict of India, number of 
criiminals was 2,000. Out of 50 lakh illiterates in the same district, number of 
cr minals was 80,000, On the basis of these figures, do you find any association 
between illiteracy and criminali:y ? 
(Positive assoc'ation between illiteracy and criminality) 
20. Find out the coefficient of association between the type of college 
training ard success in teaching from the following table : 


Successful Unsuccessful Total 
Teachers College 75 55 130 
University 123 45 170 
200 100 200 
vient of contingency. 
Also calculate coefficient of conting roa oaii 


(C-0025 J 
21. Calculate the coefficient of contingency from the fcllowing data ; 


Social status Dull Intelligent Brilliant 


Lower Middle 22 35 23 
Middle 38 70 32 
Upper Middie 60 20 20 
(M. Com., Delhi) 
(Q=0°315) 


2 i ncept of Association and Independence of attributes as 
splice О Preside table. Discuss the different coefficients proposed 
for measuring association. Use one of them for measuring the association 
between proficiency in English and in Hindi among candidates at а certain test if 
245 of them passed in Hindi, 285 failed in Hindi, 190 failed, in Hindi but passed 
in English, and 147 passed in both. (М.А. Econ., Aera, nn. 
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23. Thetable below «hows the association among fathers and sons between 
] eir heights. Fmd the coefficient of contingency between the two: 


Sons Fathers 

Тай Medium Short Total 
Tall 25 20 15 60 
Medium 10 15 35 60 
Short 20 40 20 80 


Total 55 75 70 200 


TE СА SE (C—0:342) 


24. At an examination at which 609 candidates appeared, boys out- 
numbered girls by 16 percent of allcandidates. Number of passed candidates 
exceeded the number of failed candidates by 310. Boys failing in the examination 
numbered 88. Construct the two-foid association table and calculate the coeffi- 
cient of association between male sex and success in the examination. 

(M.A. Econ., Meerut, 11973) 
(Q=— 0:07) 
25. Investigate the association bstwzen eye colour in father and son from 
the following data ; 
Fathers with dark eyes and sons with dark eyes= 100 


” » notdark,, 4, е UG mio 
” »  dak,  ,  »notdark , = 160 
” »hotdark,, „ » Not dark ,, =1,560 

(M.A. Econ., Agra, 1968) 


5 [О — 07688] 
26. In order to ascertain if marriage has any ;effect on the examination 
results of students, 1,000 students were selected at random. There were Hindus, 
Muslims and Christians, Of the 1,000 students 375 were married, others un- 


HE Of the married students, 167 passed and ofthe unmarried students, 203 
failed, 


Set out these facts in the form ofa table, filling as many spaces in the 
table as you can, (B. Com., B.H.U., 1969) 


* 27. Test for association between extravagance in fathers and extravagance 
in sons from the following data : 


Extravagant fathers with extravagant sons—327 


Miserly » » » » =741 
Extravagant ,, — „ -miserly „ —545 
Miserly 54 3» a » =234 (B. Sc., Madras, 1970) 


(Q——0'681) 


$ 28. Given the following frequencies, find out the frequencies of the posi- 
tive and negative classes and the whole number N : 


? (4BC)—28, (« BC) —20, (ABy)=19 
(BBy) —25. (49:30, (apc) =23 
(Ay) =26. (aBy)=70, 


(C)=101, — 


(ү)=140, \ 
(By) = 96, J 


шг 
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29. A college submitted the following returns to University office : 
No. of students appearing in all the three tests—350 


No. passing the first test =125 
No, passing the second test =135 
No. passing the third test =145 
Мо, passing all the three tests = 20 
No, failing all the three tests = 75 


No. passing the first two but failing the third = 25 
No. failing the first two but passing the third = 60 


From the information given above find out the number of students passing 
least two tests. (10) 


30. What is partial association? Ina town of 1,00,000 adult population, 
52,000 were males and 48,000 females, distributed according to education an 
employment as follows : 


Males Females 
(000) (000) 
Educated and Employed 38 6 
” » Unemployed 2 14 
Uneducated and Employed 4 18 
» » Unemployed 8 10 


Is there any connection between education and employment in the two 
groups as well as in the total population? Interpret the results in the light of 


actual customs. 
Олвс = gain 
Q4B hw —0*61 
L Q4p =+038 j 


31. A market investigator returns the following data of 1,000 people: 
consulted ; ч 


400 liked chocolates 

290 liked toffees 

480 liked lemondrops 

380 liked chocolates and toffees 

350 liked chocolates and lemondrops 
370 liked toffees and lemondrops. 


Show that the information as it stands must be incorrect. 
.. (М. Com., Allahabad, 1967) 
(Hint. (AB)+(BC)—(AC) 3. (B)) 


32. Ina report оп consumers preference it was given that out of 50 
persons surveyed 410 preferred variety А, 380 preferred variety В and 270 persons 
liked both. Are the data consistent? (M. Com., Delhi, 1971) 

(«B——20, No.) 

33. The male population of a certain State in India is 331 lakhs. ‘The 
number of literate males is 66 lakhs and the number of male criminals is 33,000 
If the number of literate male criminals is 6,000. Calculate the coefficient of 


association between literacy and criminality in the state or show that the data, are 
inconsistent, (M. Com., Allahabad, 1973) 


SMRE—10'77-7 
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34. Do you find any inconsistency in the following data made available 
for investigating the eye colour of brothers and sisters ? 
Brothers with light eyes and sisters with not light eyes 414. 
Brothers with not light eyes and sisters with light eyes 260. 
Brothers with not light eyes and sisters with not light eyes 238. 
Brothers with light eyes 400, (M. Com., Allahabad, 1969) 
LYes, (AB) is negative] 


35. Out the 70,000 literates in a. particular district of India number of 
criminals was 500. Out of 9,30,020 illiterates in the same district, number of 
criminals was 15,009. On the basis of these figures, do you find any association 
between illiteracy and criminality ? 

(M. Com., Jaipur, 1972 ; M. Com., Indore, 1968 ; 
M. Com,, Nagpur, 1969) 

(AB), (9g e; £2) AA 
LAR x10 1:6; G3 X100=07% ; Association in positive ] 

36. There was an outbreak of small-pox in a certain locality. There 
were 254 persons, who were not inoculated at all, out of which 20 contracted the 
disease and 12 of them died. Out of 294 persons who were inocculated, 3 con- 
tracted the disease but none of them died. Find the coefficient of association 
between (а) inoculaticn and contracting small-pox, and (b) inoculation and 
mortality among those who contracted the disease. (М. Com., Allahabad, 1970) 


37. Do you find any inconsistency in the data given below: 
N=2,000, (A)=1,000, (B)==1,000, (С) =900, 
(48)= 400, (BC)e 500, (AC)= 300, (АВС) -=240. 
(Yes, (28ү) = —40) 
38. Calculate the coeficient of cortingency from the data given below: 


A Aa As Ag 
В, n 6 2 1 
By 5 12 15 8 
Bs — 2 3 15 


(M. A., Meerut, 1969) 
(C=0'5! 
39, In an assortative mating study to find whether tall husbands tend to 


rry tall wives, the following information about the wives of 125 tall and 125 
Soorbatatured husbands was published : 


Tall husbands Short husbands 
(per cent) (per cent) 

Tall wives 56 13 

Short wives n 48 
Find the coefficient of association between the stature of wives and hus- 
bands ignoring medionisized wives. (1.4.5, 1966) 
(Q- 4-0 90) 
40, Ina market survey of 10,000 it was found that 1,110 liked 


chocolates, 7,520 liked toffees, 4.180 liked boiled sweets, 5,700 liked chocolates 
and toffees, 3,500 liked chocolates and boiled sweets, 3,480 liked toffces and boiled 
sweets and 2,970 liked all the three, Show that this information as it stands must 
be incorrect. (M. Com., Banaras, 1972) 


[(4BC) -2,910 given but (ABC) cannot exceed 2,810] 
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4l. The following table gives the number of persons suffering from 
tain infirmities in Bengal in 1961, Trace the association bet: sd 
deaf-muteness for males and females in Bengal separately. а ey Apd 


Total No. Insane — Deaf-mutes Dec yn and 
nane 
Males 260 lakhs 12,650 21,301 545 
Females 24] ,, 9,055 14,136 317 


(M,A. Econ., Agra, 1967) 
(8: +0965 Males 
= 40968 Females. 
42. (a) What is Illusory Association? What is the difference between 
partial association and illusory association ? 
(b) Can vaccination be regarded as a preventive measure for small-pox 
from the following data ; 
AE “of 1,482 persons living ina locality 368 in all were attacked by small- 
“of 1,482 persons, 343 had been vaccinated and of this group only 35 
were attacked,” (M. Com., Pos 1970) 


(AD (ag) 
E 90%, CP enis, ra ] 


43. In a report on consumers preference it was given that 
ers ons surveyed 410 preferred variety A, 380 preferred dy B, bed eil 
iked both. Ate the data consisten: ? (м Com., Delhi, 1971 
No : since (ag) == 230) 


44. Do you find any association between the tem; 
sisters from the following data ? Pru onte and 


Good natured brothers and good natured sisters 1,230 
Good natured brothers and sullen sisters 850 
Sullen brothers and good natured sisters 530 


Sullen brothers and sullen sisters 980 (M. Com., Nagpur, 1972) 
(Q 045) 


45. Ina very hotly fought battle 
70% at least of the combatants lost an eye 
7596 at least an ear 
80% at least a leg 
р 85% at least an arm 
What percentage at Jeast lost all four ? (M. Com., Allahabad) 
(10%) 
46. A survey of 1,000 companies give the following data : 
(i) Cos. with a capital of more than Rs. 10 lakhs 510 


r (ii) Cos. making profit 2490 
"Я (iii) Cos, under Managing Agent =427 
by iv) Cos, with a capital of more than Rs. 10 

a Mors making profit 189 


(у) Cos. with a capital of more than Rs. 10 lakhs 
and uader Managing Agents =160 
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(vi) Cos. making profits and under Managing 
Agents =85 
i ti it stands must be incorrect. 
Show that the information as (M. Com., Nagpur, 1971) 
47. Test the consistency of the following data : $ 
N=2,030, (4)=1.500, (8)=100, (48)—350 
mA t (B. Com., Poona, 1973) 


48. (a) What do you understand by the term *Association of Attributes". 
What do you understand by ‘complete association and *complete disassociation ? 
How do you measure the intensity of association between two attributes ? 

(b) In two towns, A and B, the following information was supplied by an 


investigator : 


Town А Town B 
Total population (in thousands) 240 234 
Literates c 40 34 
Illiterate criminals AT 40 20 
Literate criminals D 5 2 


Compare the degree of association between literacy and crime in each of 
the two towns. (В.А. Hons. Econ., Delhi, 1973) 


49. (a) 1,660 candidates took an examination 422 were successful, 256 had 
attended a coaching class and of these 150 came out successful, Estimate the 
utility of the coaching class by finding the coefficient of association. 


(b) If a report gives the following frequencies аз actually observed, show 
that there must be a mispoiot Or a mistake of sort. Possibly the statement 
frequency (BC)—85 is wrong. 

(c) Show that in a classification with two attributes : 

(4B (4) ()—N (M. Com., Meerut, 1973) 

50. From the data given below find out if the attributes А and B are: 
independent, positively associated or negatively associated : 

(А)=60, (В)=53, (48) =35, N=80 
(М.А. Econ., Jabalpur, 1974) 
[Negatively associated} 


SECTION 13 


INDEX NUMBERS 


1. (a) What is an index number? What is the importance of index num- 
bers in economic and commercial studies ? (B. Com., Andhra, 1966) 


(b) Define an index number and mention its uses 
(B. Com., Delhi, 1972) 


2. (a) Explain the term ‘index number’. Why index numbers are called 
economic barometers ? (B. Com., Bombay, 1970 ; B. Com., Kurukshetra, 1974) 


; 3. ‘Index numbers are used to measure the changes in some quantity 
which we cannot observe directly." Explain the above statement and point out 


the uses and limitations of index numbers. (B. Com., Agra, 1969) 
ds 4. o Mr Ин numbers Examine the various problems involved 
in the construction of i nu : 

eU Dama M.B.A., Delhi, 1973) 


. (b) Discuss the problems of (i) selection of a base period, (ii) selection 
of weights in the construction of index numbers. (8.4. Hons. Econ., Delhi, 1975) 


6. "Index numbers are the signs and guide-posts along the business high” 
way that indicate to the businessman how he should drive or manage his aflairs.^- 
Explain the above statement and also point out the relative advantages of the 
various types of averages as applied to index numbers. Which would you prefer 
and why ? (В. Com., Agra, 1972) 
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__7. Explain with details the use of index numbers (i) in analysing business. 
conditions, (ii) in providing indices of economic activity, and (iii) in computing. 
real wages. 

8. (a) Discuss the general methods used for censtruction of an index 


number and discuss the problems of selection of base period and the methods of 
averaging to be used. Discuss the advantages and disadvantages of a chain index 


over a fixed index ? (B. Com., Bombay, 1970) 
(b) Explain Fisher’s Ideal method of constructing index numbers and 
comment on its utility. (B. Com., Bombay, 1971) 


9. (a) What is an index number ? Examine the various problems involved 
in the construction of index numbers. Also briefly describe their uses in business. 
(B.Com, Pass, Delhi, 1974). 
(b) “In the construction of index numbers the advantages of geometric 
mean are greater than those of the arithmetic mean." Discuss. 
(B. Com., Mysore, 1967) 
10. (a) "Index numbers are economic barometers.” Explain this state- 


ment and state the precautions which should be taken in making use of published 
index numbers. «М А. Econ., Meeruth, 1975\ 


(b) "Fisher's Ideal Index Number is a compromise between two well- 
known indices—not a right compromise, economically speaking." Discuss uing 
diagrams suitably. (M.A. Econ., Meerut, 1971) 


1J. (a) Discuss the problem of assigning weights in the construction of 
weighted price index numbers. Distinguish in this connection between Laspeyre’s 
and Paasche's index numbers. (B.A. Hons. Econ, Delhi, 1972) 


(b) “The real problem for the maker of index numbers із whether he shall 
leave weighting to chance or seek to. rationalise it?» „Distinguish clearly between 
chance weighting and rational weighting. Also discuss whether Fisher’s Ideal 


formula offers a rational system of weighting. M. Com., Allahabad, 1970) 


12. (a) What is implied by ‘weighting’ in the process of index number 


construction ? Why is it necessary ? What are the commonly proposed weighting 
Schemes ? j^ 4 (B. Com., Delhi, 1969) 


(b) Explain the uses of Index Numbers. Ulustrate with examples the 


current weigh! base weighted index numbers. 
rent vpisa T (В.А. Hons., Econ.,Delhi, 1969) 


13. Discuss the conditions which an ideal index number should satisfy, 
rs, demonstrate which of the above conditions are 


Taking any two index numbe 
Satisfied by then (8. Com., Madras, 1973) 
i4. (a) Distinguish between any of the following two pairs: 
(i) Laspeyre’s and Paasche's index number formulae. и 
(iij Time Reversal and Factor Rever:al Tests. (B. Com., Delhi, 1968) 


(b) “Laspeyre’s formula has an upward bias and ШТА formule 2» 4. 


downward bias.” Explain. 

15. Explain Index Numbers. 
to satisfy ? What is a bias in an index rur 
the construction of an index number of price 


a De. What. p bar index numbers? Explain briefly the methods of 
computing index numbers— ' 
(i) by the simple average of relativ 


What tests is a good index number expected 
mber? What ptoblems are involved in 
s? Point out clearly the importance 


es method, 
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(ii) by weighted aggregative method, 
(iii) by the simple ageregative method. 
Give an example for each method. 


“Тһе simple arithmetic average of relatives index number of prices has an 
upward bias." Discuss. (4. A.S., 1969) 


17. (a) Explain clearly 


(i) Time Reversal Test, and 
(/i) Factor Reversal Test. 


Demonstrate how Fisher's Ideal Index Formula satisfies both these tests. 
Should other formulae for computing index numbers be rejected. bécause they do 
not satisfy these tests ? (M. Com., Delhi, 1968) 

18. Explain the significance of time reversal test and factor reversal test in 
Index Number Formulae and examine Fisber's Ideal Index Number Formula in 
the light of these tests, (B A., Bombay, 1970) 

(a) “Index numbers are said to be barometers of economic activities.” 
Elucidate. 


(b) Discuss the various problems involved in the construction of index 
numbers. (B.A, Hons., Econ., Kurukshetra, 1975) 

19. (a) Discuss the utility of an ‘index number’. Compare the advantages 
and disadvantages of fixed base" and ‘chain base” index numbers. 


(b) Describe the chain base method of construction o! index numbers and 
discuss its advan ages aud disadvantages as compared with the fixed base method. 


(c) Why is Fisher's Index called Idea] ? (B. Com , Kerala, 1969) 
20. (a) What is meint by cost of living index number? Explain the 
difficulties in its construction, (B. Com., Poona, 1968) 


(b) What points would you take in'o account in choosing the base and 
determining the weights in the preparation of cost of living index numbers ? 
(B. Com., Madras 1969) 
21. How consumer price index numbers are con: tructed ? Describe the 
various steps with illustrations. (B Com., Lucknow, 1966) 
22. (a) Exolain the various systems of weighting used in the construction 
of cost of living indices, indicating the formulae applied in each case. 
ы " M. Com., Delhi, 1967 
(b) Explain bricfly the nature and use of index AM ‚дшн ; 
m KO What procedure should be followed in calculating an index of cost of 
iving 
.. „ (d) Howisthe index of retail prices different from the index of cost of 
living ? 7 (.C.W.A., 1968) 
23. (i) Examine the factors that guide the choice of the base period for 
the construction of index numbers, 
(ii) Describe the three tests that a good index number should satisfy. 
(B. Sc., Madras, 1970) 


. 24. (a) What is the usefulness of price indices? What problems do we 
face in their construction ? 


(b) Distinguish „between unweighted and weighted index numbers. 
Enumerate some of the important methods of weighting a price index and discuss 
their relative merits and demerits. B A. Hons. Econ., Delhi, 1975) 


р 25. (а) What is meant by reversibility of an index number ? Describe the 
time and factor reversal tests in the theory of index numbers. Give a formula 
which satisfied both these tesis, (M.A, Есоп., Punjab, 1973) 
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(b) “Link relatives are based on the idea that one series сап be converted 


i use ti i holds." Do you agree ? 
into another because time reversibility holds Ba. РЭА Econ., Delhi, 1910) 


i i te (i) Base Shifting, (ii) Splicing, and (iii) 
put (0) Exp ano idan Gain ШИЛ (M. Com., Delhi, 1973) 


i index number ? 
(b) What are the requirements of a good index n ГБ. Com.. Poona, 1973) 


is ci in in this context chain base index 
nee Ме ea ы раан (В.А. Hons. Econ., Delhi, 1974) 


27. The following are the prices of six different commodities for 1969 and 
7970. Compute a price index by (a) simple aggregative method, and (b) average 
of price relative method, by using both arithmetic mean as well as geometric mean. d 


Commodities Price in 1969 Price in 1970 
(Rs.) (Rs.) 
40 50 
$ 69 60 D 
Cc 20 30 
D 50 70 
E 80 90 
Р 110 110 


impl: . method, Py1—113:88 ; Average of price relative 
hod, Ры 12125 (Arithmetic mean), Р,у=119'8 (GM) ) 


28. Enumerate the criteria that a good index number should satisfy, 
Given the following data : 


AR | Year | Wheat | куе | Oats 
Price _ 1966 2 QR RU 

УА Quantity: E OM T 4 10 1 
Price 1568 3 2 2 

[o Quantity — — | 6 5 4 
Price | 2970 4 3 2 

B Quantity | Suas АШ 


Calculate Laspeyre's and Paasche's index number of prices taking 1966 as 
base and comment on the two sets of index numbers you obtain. 


(Laspeyres’ method, Ро —133:2, Paasche's method, Роу =148 5] 
29. Calculate Fisher's Ideal Index for the following data ; 


П Base year Current year 
Commodity | Price Quantity Price Quantity 
Ре” orem o Seno 50 ТУ 10у us ЧИ SSR 
В 2 100 2 120 
С. 4 60 6 60 
р 10 30 12 24 
(B. Com., Poona, 1969) 
(Po. 13675] 


30. Compute an index number of price by a suitable method from the 
data given below : 


Base year | Current year 
Commodity | Price Value Price Value 
A | 2 20 | 4 45 
B 4 24 5 30 
Cc | 6 30 | 8 40 
р 8 40 i 10 60 
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Note. Figures are in appropriate units, 


(Hint. Obtain quantities by dividing value by price for each commodity.] 
n [P,9—140'3] 
31. Given the following data, what index number will you use for pur- 
Poses of comparison ? Give reasons, 


. Rice Wheat Jowar 
Year Price Огу. Price Oty. Price Qty. 
1957 9:3 100 64 11 51 5 
1967 45 90 317 10 27 2) 


(B. Com., Madras, 1968) 

[Po1—49*1 Fisher's Index] 

ү 32. Using the following data апа 1967 аз base period, compute series of 
simple aggregative price and production indices for the two fuels. 


Producer's price Production in billions 
Ttem and Unit 1967 1968 1969 1970 1971 1972 
Coal (ton) Rs. 5 Rs.3 Rs. 4 3 2 2 
Crude oil (barrel) Rs. 2 Rs, 3 Rs, 4 4 4 3 


Also calculate price and production indices by employing Fisher’s Ideal 
formula. (B Com., Delhi, 1968) 


33. Annual production (in million tonnes) of four commodities is given 


below ; 
Production in year Weights 
Commodities 1960 1964 1965 
A 160 200 26 20 
B 24 42 45 20 
с 50 72 68 13 
D 250 168 156 17 


Calculate quantity index numbers forthe years 1964 and 1965 with 1960 
as base year, using (i) simple arithmetic mean and (й) weighted arithmetic mean, 
of the relatives. Q.C.W.A., 1967) 

(i) Оо =146"00 
(ii) Qo,—150:25 


34. (a) Working class cost of living index numbers are constructed for 
two towns, A and B, with the same base year, and it is found that in the current 
year the index for town A is 120 and that for town B is 132. It is concluded that 
town В is 10 per cent more costly that town А in the current year for the working 
class population. Do you agree ? 


(b) Price and quantities for the base year and the current year for eight 
groups of commodities are given below : 


Price Quantity 
Group х 
Base year Current year Base year | Current year 
1 12 20 50 120 
2 10 2 ' 100 80 
3 14 15 60 70 
4 16 н 18 30 50 
5 18 20 40 40 
6 22 15 70 60 
7 20 16 90 100 
8 15 18 80 80 
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Calculate the index numbers for the current year by 
(i) base year weighting, and 


(ii) Fisher's Ideal formula. (M.A., Banaras, 1969) 
(rs method, Руу=102`3 
Fisher's method, Po,--105'3 


à 
35. The prices of rice, wheat and pul А 
given below : pulses for a few successive years are 


Weights 1969 1970 1971 1972 1973 
Rice 5 100 110 123 129 138 
Wheat 3 92 100 124 116 103 
Pulss 1 58 60 73 74 68 
_ Compute Index Nos.'of cereal prices with the above d i 
weights given in the table and taking 1969 as the base me ша a 


£1969 1970 1971 1972 1973 
(100 10884 12725 12787 1270! 


„ 36. From the following index numbers prepare new ones by (a) consider- 
ing the year 1972 as base year and (b) using chain base method : 


Year 1969 1970 1971 1972 1973 1974 
Index Numbers 100 110 175 250 300 400 


(B. Com , Poona, 1966) 
( Year 1969 1970 1971 1972 1973 1974) 
| 4. No. using 1972 


: as base 40 44 70 100 120 160 
| 1. No. using chain 
base 100 110 175 1501 1801 2399) 


31. The following table gives the price relatives with 1963 as base year 


for three different years. Compute the liak relatives for years 1963-66. 
1963 1964 1965 1966 
132:2 1191 
(M. Com., Culcutta, 1966) 
1963 1964 1965 1966) 
(rink Relatives ` 100 nr 1019 1052) 
38. From the chain hase index numbers given below prepare fixed base 
index numbers and verify the calculaticns : 
Year 1962 1963 1964 1965 1966 
Index numbers 80 110 120 105 125 
f 1962 1963 1964 1965 am 
\ Fixed Base Index No. 88 88 1056 1109 — 10533 


Year 
Price Relatives (1963—100) 100 1117 


39. Given the index of money earnings of factory workers in India and 
4l--India consumer price index numbers, calculaje the index of real earnings : 
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Index of money All-India Consumer 
earnings Price Index Nos, 
1961 100 0 1000 
1962 1071 981 
1964 107-7 962 
1967 1201 105-4 
1970 1314 1181 
Comment on results, 
Index of Real Earnings 
1961 1962 1964 1967 1970 
100 109-2 111-96 114:3 111-3 


ў 40. The following table Bives per capita income and the cost of living 
index for India from 1959—60 to 1967—68 : 


Year Per capita Coton Living Year Per capita Cost of Liying 


income index income index 

(Rs.) (Base 1959-60) (Rs.) (Base 1959-60) 
1959-60 67 100 1954-65 139 216 
1960-61 70 105 1965-66 137 219 
1961+62 78 117 1966-67 143 242 
1962-63 112 160 1967-68 160 258 
1963-64 139 217 


Deflate the money income with reference to the cost of living index, 
[100, 99-5, 99'6, 1045, 95:7, 96-1, 93-4, 88 2, 92:5] 


4l. Given below аге two sets of indices one with 1949 as base and the 
other with 1956 as base : 


(a) Year Index No. (b) Year Index No. 
1949 100 1956 100 
1950 115 1957 105 
1951 122 1958 118 
1952 150 1959 98 
1953 200 1960 102 
1954 220 1961 105 
1955 240 1962 120 
1956 250 1963 125 
Index (b) spliced to (a) 
[te 1956 1957 1958 1959 1960 1961 1962 1963 | 
Мо. 250 2625 295 2450 2550 2675 3000 3175) 


1 
1957 1958 1959 1960 1961 1962 1963 1964 i 
250 2625 2875 2450 2550 2675 5000 3120 J 


43. The following are the group index numbers and the group weights of 
an average working class family’s budget. Construct the Cost of living index 


[ Index No. with 1949=100 


number by asSizning the given weights ; 
Group Index No, of Weights 
Чап. 1960 
Food 152 48 
Fuel and Lighting 110 5 
Clothing 130 19 
Ho! 100 15 
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Ed Comment of the significance and use of the index number thus construc 
: (B. Com., Madras, 1969) 


(Cost of Liviug Index —128:29) 


‚ 
„144. Explain the significance of the Family Budget Enquiry in the cons- 
truction of cost of living index numbers. 


Д The weights attached to certain items and tbe price relatives of these 
items are given below : 


Items Weights Price Relatives 
Food 55 150 
Clothing 25 120 
House Rent 8 175 
Misceilaneous 12 168 


Calculate the cost of living index number based on this data. 
(B. Сот.) Andhra, 1973) 
(Cost of Living Index=146'7) 


45. What are the uses of the cost of living index number 7 Calculate the 
cost of living index number from the following data : E 


Items Price 

Base year Current year Weights 
Food 30 47 4 
Fuel 8 12 1 
Clothing 14 18 3 
House Rent 22 15 2 
Miscellaneous 25 30 1 


(B. Com:, Madras, 1971) 
(Cost of Living Index--115:84) 
46. (a) What are the factors which one has to keep їп view in the cons- 
truction of index number of prices 2 
(b) Is cost of living index also the index of costliness ? Explain. 
(c) In calculating cost of living index the following weights were used: 
Focd 8}, Rent 2, Clothing 24, Fuel and Light 1, Miscellaneous 1. 

ч Calculate the index number fora data when the percentage increase in 
prices of the various items over prices of July 1938 (— 100) were 31,57, 90, 75 and 
8s respectively Н И (Т.С!И.А., 1973) 

[Cost of Living Index—152:2] 
Laspeyre's and Paasche's index number. 


47. Assess the rcla'ive merits of 
Mowing data on imports into a country 


.. Compute the two series for the fol 
during 1960-66 : 


( Value Value on the 

ne ive £) basis of 1960 values 
(m. £) 

1960 1,044 1,044 

1961 861 1,067 

1962 702 939 

1963 675 946 

1965 738 1012 

1956 1,077 


1956 848 (B.A., Madras. 1971) 
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48. The following are the index number of prices (1959=100) : 


Year Index Year Index 
1959 100 1964 410 
1960 10 1965 400 
1961 120 1966 380 
1952 200 1967 370 
1963 400 1968 340 


Shift the base from 1959 to 1965 and recast tke index numbers. 

(M. Com., Mysore) 

f Year 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 ) 

| Index No. | 

(1965=100 25 27:5 300 500 1(00 1025 1000 950 922 850 J 

49. The prices of four commodities in a Place during two years are given 

below. Compute the four simple index numbers with the given data for 1920 
with 1910 as base year, 

Price in rupees per unit for commodities 


Year I Il HI Iv 
196» 156 152 156 154 
1966 — 307 310 315 292 


(В.4., Madras, 1966) 
Commodity I Hi Il IV) 
I. No. 1968 2059 1099 1896) 


50, Compute the appropriate index numb i 
ae оороо te рр. mber for purposes of comparison 


Rice Wheat | Jowar 
Year i | ni € 
Price Qty. Price Qty. | Price Qty. 
— == i - j 
1965 4 50 3 10 2 5 | 


1966 10 40 8 8 


i [Po1—250 (Fishér's method)] 


5). In the preparation of rural price index i i 
г jj number 
following results were obtained : атса 


Group Wr. Oct. 1966 Nov. 1966 
Food 81 323 44 
Lighting 2 190 186 
Clothing 8 432 397 
Misc, 9 369 377 


Find the index number of tural prices for the two months. 
(B. Com., Mysore, 1966) 
U. No. for Oct =333 2 34. No. for Noy, =348°5) 


following ree Laspeyre’s, Paasche’s and hence Fisher’s index for the 


Wheat Rice Maize 
1959 15 5 10 
Quantity 1954 12 A з 
Price Rs, 1 D 20 4 
1964 22 27 7 


e (B. Com., Mysor 1967). 
LLaspeyre's index—146:6 3 Paasche's index — 14574 ; Fisher's index 14596] 
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лай, 53. Obtain Paasche's Index Number for the following data (1939, base 
г): 


Price Quantity 
Commodities 

1959 1965 1959 1965 
Wheat 691 2070 937 741 
Oats зго 66'1 1,499 958 
Rye 439 1020 30 39 
Barley 40:5 98'9 400 NP: 
Corn 568 1300 3,242 2,341 


(B. Com., Bombay, 1967) 
[Py 72419) 


54. From the information given below, calculate the index number for 
the “oils and fats” sub-group for the month of June, 1966 (Base: 1960= 100) : 


Price per unit of quantity 


7 Wei 2 
SON FOU Unit of enn порог s UBER PUEDES 
His GER DOE expenditure Pasic Price June 1965 
Vis el, cal mee 
Ks Rs. 
Cocount oil 50) ml. 9:55 1:36 264 
Groundnut oil 500 ml. 7105 1:00 233 
Vanaspati 500g 1940 | 175 337 


(B. Com}, Bombay, 1972) 
55. Calculate Fisher’s Ideal Index Number from the following grouP of 
two items : 


Current year 


Base year 
Item 
No. ; 
i Р tities i E Ae Y uantities in 
Price in Rs. arcis Price in Rs. | рели 
1 4 | ro B | 4 
2 8 | 15 7 | 5 


(В. Com., Bombay, 1970) 


[Pore $52] 
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56. In 1920, a Statistical Bureau started an index of production based on 
1914, with the following results : 


Year 1914 1920 1929 
(Base) 
Index 100 120 200 
In 1930, the Bureau reconstructed the index on a plan with base 1929, 
Year 1929 1935 
(Base) 
Index 100 150 


In 1936, the Bureau again reconstructed the index on yet another plan with 
the base year 1935. 


Year 1935 1939 
(Base) 
Index 100 120 


Obtain a continuous series with the base 1935, by splicing the three series, 


(B. Com., Bombay, 1971) 


f 1914 190 1929 1935 1939 
ER 40 67 100 120 


57. In the construction of a certain Cost of living Index Number. the 
following group index numbers were found. Calculate the Соз! of Living Index 
Number by using (i) the weighted arithmetic mean, and (ii) the weighted gecmet- 
tic mean ; 


Group Ihdex Number Weights 
1. Food 350 5 
2 Fueland lighting 200 1 
3. Clothing 240 1 
4. House Rent 160 1 
5. Miscellaneous 250 2 


(B. Com., Bombay, 1971) 
f Index based оп weighted arithmetic mean=285 
ar MN ET -2754 | 


: 58. Calculate a suitable Index Number for the year 1965 with 1950as base 
using the following data : 


1960 1965 
Commodities 


Price Quantity Price Quantity 


— | MÀ e | 


A 12 100 20 120 
Bus 4 200 4 240 
б 8 120 12 150 


20 60 2: X 50 
| 


(В. Сот., Madurai, 1967 ; В. Com:, Mysore, 1969) 
[Fisher's Index P5, —137:4] 
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59. Calculate the Laspeyre's and Paasche’s Index Number from the 


following data : 


Base year 
{ Quantity Price 
per lb. 
Bread 60 40 paise 
Meat 40 45 p 
Tea 05 90 ,, 


Current year 
Quantity Price 
per lb. 
TO 30 paise 
50 50/3 
TS Lj et 


(B. Com., Delhi, 1969) 


f Ро, Laspeyre's— 8602 
{ Poy Paasche’s=81:25 J 


60. Find index numbers for the year 1967, 1968 and 1969 by the chain base 
method, with base year 1966, from the following table : 


| Year 
Link Index 


1966 


1967 


100 110 


1968 
95'5 


1969 
1095 
(LC.W.A,, Jan., 1969) 


[Chain Indices 1961=110, 1962— 105'05, 1963—115:03] 


61, Calculate the Fisher’s Index from the data 


Commodities 


DADA 


62. Interpret the following data relating to Indian prices : 


(Base=1969) 


1968 
P Q 
6:4 93 
1r9 146 
132 Ht 
38 32 


given below : 


1969 
Р Q 
46 49 
024 89 
28 68 
40 26 


(B. Com., Kerala, 1969) 
[Pox Fisher’s=89'4) 


Base 


Wholesale prices— 
(a) All commodities 
(b) Food articles 


Consumer prices 
(All India) 


Security prices 
(Variable dividend) 


1952—53— 10€ 
” =100) 


1949=100 


1952—53=100) 


Nov. 
1962 


1301 
129 6 


134 


1705 


Nov. Nove 
1963 1964 
1247 1568 
1366 1663 
138 1630 
1745 166:0 


(М.А. Econ;, Punja5, 1969) 
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63. (a) Compute Index Numbers from the following data by using 
(i) Laspeyre's, (ii) Paasche's, and (iii) Fisher's Ideal Method : 


Base year Current year У 
Commodity Quantit y Price Quantity Price 
A 12 10 15 12 
B 15 7 z г 
С 24 5 
D 5 16 5 14 


(B. Com., Bombay, 1970) 

f Laspeyre's Index-—118:8 Гү 

| Paasche's » 7112'8 | 

L Fisher's » =1157 J 

(b! The following table gives the per capita income and the cost of living 


index ofa particular community, Calculate the real income taking into account 
the rise'in the cost of living : 


Year 1959 1960 1961 1962 1963 1964 1965 1966 
Cost of Living Index 

(Base 1959) 100 104 115 160 210 260 300 320 
Per Capita Income 5 

(Rs) 360 400 480 520 550 590 610 650 


(B. Com., Bombay, 1970) 


Year 1959 1960 1961 1962 1963 1964 1565 1966 
Real Income 3600 384°6 4173 3250 261:9 2269 2030 2031 


К 64. The following figures are given to you. If the cost of living has been 
rising fasier than the price level of capital goods and you want this to be reflected 


in your final table, what other index number series will you require апа how will 
you use them ? 


Yeer Jy д 3 4 5 6 
Money ЇЧ. Income 2,000 2,050 2,00 2,150 2,300 2,500 
General Price Index 95 91 100 110 120 125 


(В.А. Hons, Econ}, Delhi, 1970) 


65. An index is at 100 in 1961. It rises 4% in 1962, falls 6% in 1963, 
falls 4% in 1964 and rises 3% in 1965, Calculate the index numbers for the five 
years with 1963 as base, (M. Com., Delhi, 1970) 

f. Year 1961 1962 1963 1964 1965) 
L 1. Nos. 102-2 1063 100 96:0 y8'9 J 


...66. (a) What is an Index Number? Discuss with suitable example the 
application of Index Number in business : 


(6) An enquiry into the budgets of the middle class families in Bombay 
Save the following information : 


Expenses on Food Rent Clothing Fuel Misc, 

Я 35% 15% 20% 169 20% 
Price (1955) (Rs,) 150° 50° — 100 20 60° 
Price (1966) (Rs.) 174 60 125 25 90 


Wnat Changes in the cost of living figure of 1966 have taken place as com- 
pared to 1965 ? (M B.A., Delhi, 1970) 
[ Cost of Living Index 12671 
Cost of Living has gone up by 26:195 
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67. (a) Calculate the index number of prices for 1967 on the basis of 1961 
from the data given below 2 


Commodity А в G D E 
1961 Prices Rs. 16 40 8 5 10 
1967 Prices Rs. _ 20 60 8 6 8 


Give the weight 40, 25, 5, 20, 10 respectively, 
(b) (i Indicate the importance of Base Shifting, 


(ii) Shift the base of the following series of Index Numbers from 1958 
to 1962 and recust the Index Numbers. 


Years 1957 1958 1959 1960 1961 1962 
index Numbers 105 100 65 72 80 75 
Years 1963 1964 1965 1966 
ladex Numbers 85 90 92 88 

[(a) Por=124°5] \В. Com., Mysore, 1970) 


68. The average annual wholesale Prices of steel per ton їп varicus years 
are given below: 


Construct an index with base 1944. 


Year Prices Year Prices Year Prices 
1944 78 1950 98 1956 99 
1945 54 1951 94 1957 76 
1946 67 1952 Ed 1958 75 
1947 56 1953 7 1959 71 
1948 72 1954 76 1960 50 
1949 102 1955 112 


(B. Com., Allahabad, 1969) 


100,692, 85:9, 71:8, 92:3, 130°8, 125*6, 1205, 212°8,) 
100, 97°4, 143°4, 126:9, 97°4, 96:1, 64°96, J 


iT ї i i ing 1958 and 
9. The following table gives the average prices for rice during 
Prices бйр Ос, 1968 Tor 6 different markets situated in U.P. along with WAS 
priate weights for the markets, Calculate the price index for Oct., 1968 taking 195 
as the base period. 


CEWEMM PC to inu n2. 5 сут Ur 2 1238 eee 


Price per Md. in Rs. 
Market Weight Average for Average for 
1958 1968 
e 
20:00 28:60 
а b. 22:50 32:40 
n 55 27:50 3630 
ш 22 25-00 32:00 
Iv 50 25'50 35'70 
d 30 24-00 36-00 


(В. Com., Banaras, 1969) 
[13953] 


SMRE—10'77-8 
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. 70. The following table shows the domestic investment in India for 
certain years, in crores of rupees, at 1948-49 prices : 


Year Investment Year Investment 
1950-51 721 1958-59 1,317 
1955-56 1,261 1959-60 1,482 
1956-57 1,446 1960-61 1,826 
1957-58 1,678 1961-62 1,736 


Construct a simple index of investment with 1950-51 as base. 
(M.A., Econ., Agra) 


Year 1950.51 1955-55 1956-57 1957-58 1953-59 1959-60 1960-61 1961-62 
Index 1000 1749 2006 2337 1826 2006 2533 2408 
7i. Construct a cost of living index number for 1967 on the basis of 1966 


from the following data, Use family budget method by taking average of quanti- 
ties as weights. 


Year Food Sugar Misc. 
Price 1966 2 3*5 6 
Qiy. 70 63 30 
S Pric 1967 3 5 5 
Qty. 75 


35 
(M.B.A., Delhi, 1969) 
[1245] 


72. Explain what is meant by the Factor Reversal Test, Construct with 
the help of the data given below, ;Fisher's Ideal Index and show it satisfies the 
Factor Reversal Test. 


Commodity Base year Base year Current year Current year 


Price Quantity Price Quantity 
A 12 100 20 112 
B 4 200 4 240 
С 8 120 12 120 
D 20 60 24 48 
E 16 80 24 52 
(M. Com., Allahabad, 1969) 
[P,1—1398] 


73. From the fixed base index numbers given below, prepare chain base 
index numbers. 


Years 1965 1966 1967 1968 1969 1970 
Index Nos, 158 199 204 190 196 200 
y (B. Com., Meerut, 1971) 
Year 1965 1966 1967 1968 1969 1970 
[on Indices 100 10427 1048 93:14 10316 102404 
74. Construct Fisher’s Index from the following table : 
Base year Current year 
Price Qt. Price Qty. 
A 2 40 6 50 
1B 4 50 8 40 
С 6 20 9 30 
р 8 10 6 20 
Е 10 10 5 70. 
1B. Com., Banaras, 1972) 


[Pj 14378] 
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Я 75.3 Apply the geometric mean to find the general index from the follow- 
ing group indices by assigning the given weights : 


Group А B С р Е Е 

Group Index 118 120 97 107 111 9з 

Weight 3 1 2 6 2 2 
> (4.C.W.A., 1969) 


Z log IXW 
( ны. Index a.t. 20" 1081 ) 


76. Calculate the index number of prices for 1972 on the basis of 1971 
from the data given below : 


Prices per unit Prices per unit 
Commodity Weights in 1971 in 1972 

(Rs.) (Rs.) 
Rice 40 1600 20 00 
Wheat 25 40°00 60 00 
Linseed 5 050 0°50 
Сиг 20 512 625 
Tobacco 10 2:00 1:50 

(7.D.C. П Yr., Rajasthan, 1973) 

[Index No. —124'4) 


77. Apply Fisher's Ideal Formula and construct an index number from 
the data given below: 


Commodity Base year Base year Current year Current year 
Price Qty. Price Qty. 

A 8 50 12 f 60 
B 3 20 4 40 
с 10 24 15 30 
р 5 100 4 200 

(B. Com., Rajasthan, 1972) 

[Р1=116] 


pared with a fixed base 
243 and 285, Calculate the consumer 


Ё ive i tance of the following eight groups of family ex- 
аА 88 clothing 97. fuel and light 65, house- 


di u nd to be— food 3:8, rent i 1 
fold AE 71, miscellaneous goods 35, services 79, drink and tobacco 217. 


Th «por.ding increase in price for Oct, 1960 gave the foll wing values—25, 
eu oad 4. Falculate the percentage increase in the group 


‘services’, if the percentage increase for the whole group is 15°2 Foma 1968) 


he help of the following data construct an appropriate. index : 


80. Witht 
Wheat | 5 Кісе Gram 
Year — Ic | 
оу. | Price | Qm. | Price | Oty. | Price 
neum i X 
1959 15 B3 | 5 202 10 : 
1968 12 223 4 274 8 7 


ERN IT EL ee LL UN C PURSE 
(B. Com., Meerut, 1970) 


iPo1—12079j 
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81. From the figures given below calculate the index numbers of real 


wages taking 1968 as base : 
er КЕ DA ЕГУ ИР ee «єс eS du an MSN FAN 


poe eo с ee SE 


Fish: 


Average | Consumer price 

Year Monthly wages Index Мо, 

Rs. 1965=100 
1965 120 100 
1966 132 120 
1967 143 130 
1968 150 150 
1969 171 180 


(M. Com., Delhi, 1970) 


( Year 1965 1966 1967 1568 1969) 
l Index No. ! 
ofreal wages 120 110 110 100 95 J 
82. From the data given below calculate the price index number using 
г?з Ideal Formula, 
Base year Current year 
Commodity Price Qty. Price » Qty. 
A 10 59 12 60 
B 8 30 9 32 
с 5 35 7 40 


(B. Com., Rajasthan, 1969) 


83. Find the current consumer price index (Miscellaneous Group) with the 


help of the data given in the following table : 


index 


Jtem Weight Basic prices Current prices 
(Rs.) (Rs.) 
Barber 21 0:05 0:12 
Washerman 23 0:04 0:16 
Soap 321; 0:50 1:60 
Betelnut 21 0'50 3:20 
Biris 23 005 0:24 
Briefly interpret your result, (B. Com., Banaras, 1972) 


[Price Index —425:6] 
84. The following table gives the per capita income and cost of living 


number of a particular community. Deflate the per capita income by taking 
into account the rise in the cost of living. 
Per Capita Cost of Living 
Year Income Index No. 
Base 1959 
1959 65 100 
1960 70 110 
1961 75 120 
1962 80 130 
1963 90 150 
1964 100 200 
1965 120 250 
1966 150 300 
(M. Com., Banaras, 1969) 
Year 1959 1960. 1961 1962 1963 1964 1965 1966) 
Real Wages 
(Rs) 650 6*6 625 615 600 500 2:80 429) 


U 
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85. Construct index numbers of price from th 
d, (ii) Paasche’s method, and (iii) B 


(i) Laspsyer's metho 
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e following data. by applying. 
owley's method : 


Commodities Base ) ear. Current year 
Price Price Oty. 

A 2 4 6 

B 5 6 5 

c 24 5 10 

D 2 2 13 


Comment on these index numbers. 
f La:peyre's Indexe=125'0 
\Paasche’s »» 
LBowley's s 1256] 
86. The price quotations of four different commodities for 1951 and 1965. 
are given below : 


Commodities Weight Price in Rs. 
1965 1951 
A 5 45 20 
B 1 32 2:5 
С 6 45 30 
р 2 r8 10 


Calculate the index number for 1965 with 1951 as base, by using 


(i) Simple average of price relatives 
ii) Weighted average of price relatives. 
ооа Е B. Com., Bombay, 1971) 
Сй) 170775 (ii) 16405] 


87. The data given below show the per capita consumption of a few 
selected items in the years 1959 and 1969. Construct two Consumers Price Index 
numbers for the year 1969, one based on Laspeyre's method and the other based 
on Paasche's method. 


Food Item Price per Price per Consumption Consumption 
Eg. (Rs.) Kg. (Rs.) in Kg. in Kg. 
1959 1969 1959 1969 
Flour 3:30 6:90 185:0 152:0 
Potatoes 225 3731 1920 1910 
Veal 1740 55'70 10:5 10:9 
Sugar 6740 9770 947 981 
Coffee 34:50 65:00 3r] 68 
Cheese 26°80 61:40 34 33 
Breakfast food 18:60 30°10 T4 59 


8$. From the data giv 
of four commodities by using 


(В.А. Hons. Econ. Delhi, 1972) 


en below construct an in 
Fisher's Ideal formula : 


dex number of the group 
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Base year Current year 
Commodity Price per Expenditure Frice per Expenduure 
unit in rupees unit in rupees 
1 2 40 5 75 
2 4 16 ‚8 40 
3 1 10 2 24 
4 5 0 60 


25 1 


(B. Com., Bombay, 1969) 
j (Ро =219'1) 


89, The following figures relate to the prices and quantities of certain 
commodities. Construct an appropriate index number using the following data ; 


1970 
Commodity = =. 
Quantity | Price Quantity | Price 
A 50 32 50 30 
35 30 40 25 
с 55 16 50 18 


Check whether this index number satisfies the ‘time reversal test’, 
[Po1=105°4] (B. Com., Bombay, 1971) 
90 Construct a suitable index number with the help of the following data: 


Commodity Wheat | Rice Gram 
Бъ : a) ee 
Year Qty. Price Qty. Price Qty. Pric e 
1965 15 14 5 20 10 4 


(B. Com., Bombay, 1972) 
[Hint : Fisher's Index Ро =161-4) 


‚‚ 91. The price quotations of 5 different commodities for the years 1965 and 
1970are as follows : 
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Commodity Weight Price in Rs, 
1965 1970 
A 4 2:50 400 
B 8 2:00 320 
© 6 320 400 
2 360 450 
Е 3 100 170 


Calculate the price index number for th 70 wi 
Ne rthe year 1970 with 1965 as the base 


(i) Simple average of price relatives, and 
(ii) Weighted average of price relatives. 
[G) 153, (ii) 16078]. · (B. Com., Poona, 1973) 


92. Computea weighted cost of living index icr the year 1970 bast d on 
the year 1902: 


Commodity Prices per Kg. Weights 
(in paise) 
1970 

A |60 108 40 

В 125 225 27 

[^ 50 94 17 

р 40 ‚ж 653 13 

Е 120 240 3 
ІР = 17968] (В. Сот., Роопа, 1973) 


93. Construct Cost of Living Index numbers from the following price 
relatives for the years 1971 and 1972 with 1968 as base, The weight for food, rent, 
clothing, fuel and light and miscelianeous are 60, 16, 12, 8 and 4 respectively : 


Years Food Rent. Clothing Fuel & Misc. 
Lighting 
1968 100 100 100 100 100 
1971 107 105 108 101 102 
1972 108 106 110 104 104 
[For 1971, Po1—106:12 and 1972, Роз =107:44) (Bl Com., Кај., 1973) 


94. Prices paid and quantities consumed of three commodities during two 
time periods are : 


Commodity Time period I Time period П 
Pi а Pa 9 
А tu 2 15 1 
B 15 3 10 3 
c 20 4 15 4 


(i) Keeping the quantity fix of period I, what percentage change in prices 
has occurred between periods ? 
(ii) What is the percentage change in prices, if the quantity mix of period 
II is used as the base? 
(iii) What is the percentage change in quantities between the two periods 
where prices in period are the base ? 


(iv) What percentage change in the value of consumption has Occurred ? 
(5.A. Hons. Econ., Delhi, 1974) 
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95. The price of agricultural commodities for 1966-67 and for tte month 
сара 1970 are given below alongwith the value of these commodities in 
1 67: 


Prices Value of output 
Commodities Unit 1966-67 Dec. 1970 in million rupees 
(Rs.) (Rs.) 
Rice Maund 13°75 13°75 8,364 
Wheat "5 9770 9°70 2,207 
Jowar F 6°03 8-00 876 
Cotton (raw) 784 lb. 46600. 433700 701 
Tea Ib. 1:25 1°75 534 
Calculate the weighted index number of prices of these commodities for 
December 1970 taking 1966-67 as base. (B.A. Hons. Econ , Delhi, 1975) 


96. (а) “For constructing index numbers the best method on theoretical 
ground is not the best method from practical point of view, so out of a long list 
of methods, no method is really ideal." Comment. (M. Com., Delhi, 1975) 


(b) A price index was started with 1961 as base. Ву 1965, it rose by 
20%. The link relative for 1966 was 90, In this year a new 
This new series rose by 10 points by next year. 


97. Apply Fisher's method and calculate the in. 


datas dex from the following 
E 1973 1974 
Commodities Price Qty. Price Qty. 
A 10 4 12 3 
B 15 6 20 5 
C 2 5 5 6 
D 4 4 4 4 
(В.А. Hons. Econ., Kurukshetra, 1975) 
Ind: 39'9 
SECTION 14 ore 


ANALYSIS OF TIME SERIES 


i i ‘Ti ies Analysis’. Indicate the 
1. Explain clearly the meaning of ‘Time Series 
importance оѓ Such analysis in business. (B. Com., fom va 
t is a time series ? Mention its important comp nents. Exp 
these тА CEN examples, Discuss briefly the methods of smoothing a 


time series. (B. Com., Bombay, 1967) 
i i i ies? Explain giving 

What purpose is served by analysing a time series 
m. ЖА аен com ponents of time series. (B. Com., Bombay, 1970) 


i ion i time series ? 
: What are the different components of fluctuation in a |, 
Elucidate he methods available for measuring the trend component and their 


relative merits and demerits. (M. Com., Delhi, 1567 ; B. Com., Delhi, 1968) 
(b) What are the limitations and advantages of the moving average 
method of trend fitting ? (В. Com., Delhi, 1973) 


4. (a) “The analysis of time series consists of the description and measure- 

ment of the various changes or moverrents as they appear in the series during the 
eriod of time. 5 A | 

б À Classify these changes or movements, Also mention the different methods 


used for measuring trend and explain fully any of them. (М. Com., Nagpur, 190) 


(b) Explain briefly the different Components of time series. 
(B.A. Hons. Econ., Kurukshetra, 1975 ) 


/——— 
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5. (a) Explain how a ‘growth factor’, a ‘decline factor’, a ‘seasonal factor” 
and a ‘cyclical factor” affect a variable over a period ot time. 
(B. Com., Mysore, 1969) 
1 . (5) Distinguish between trend ,seasonal variations and cyclical fluctuations — 
ina time series. How can trend be isolated from fluctuations ? 
(M.A. Econ., Meeruth, 1975) 


6. rxplam the шицу of time series. How can you com are one ume 
series with another ? What is ‘lag’ in time series and indicate how iris calculated is 
(-C.W, A., July, 1969) 


7. (a) What is secular trend? Critically examine the various methods of 
measuring trend. 

(B. Com., Madras, 1968 ; B. Com., Poona, 1967 ; B. Сст., Delhi, 1974) 

(b) Explain the meaning of Time Series Analysis Mention the impor- 

tant components into which a time series may be analysed. Discuss briefly the 

importance of such analysis in business. (M. B. A., Delhi, 1973) 


8. (а) What is economic time series? Explain the meaning and impor- 
tance of trend in economic time series. (B.A. Hons., Econ., Delhi, 1967) 


(b) What are the different components of a time series ? 
(В.А. Hons., Econ., Delhi, 1969). 


9. (a) How would you analyse a time series of records extending over 


30 years ? Describe in detail. 
(b) What do you understand by the term ‘moving average’? Indicate 
10. (a) What is ‘Secular Trend’? Discuss any two methods of isolating 
trend values in a time series. j 
(b) What is meant by trend? How would you fit a i i 
by the method of least ДАН, ? г (В. Con pos po 
1. (a) What is meant by seasonal fluctuation ? State the procedure in 
obtaining a seasonal index by the method of monthly averages ? 
(b) Explain the different methods of measuring seasonal component in 
a time series data. How will you eliminate the seasonal component ? 
12. (a) What is meant by Time Series in Statistics ? 
(b) Distinguish between ‘Secular Trend' and ‘Seasonal Movement’ ina 


Time series, ix 
(c) Are all ‘periodical movements’ necessarily seasonal? Give reasons 


for your answer with appropriate illustrations. 
13. Explain briefly the various methods of analysing seasonal fluctuations 


in time series. 
14. What are the different components of an economic time series ? 


How would you determine seasonal index ? 

15. (a) Explain how a time series can be analysed in three components : 
(i) Trend. (ii) Seasonal variations, and iii) Irregular P'uctuations. 

(b) Distinguish between seasonal variations and cyclical fiuctüations. 

16. Describe any method you are acquainted with for the measur 
Of cycles of business activity and state how the knowledge of i ates 
could be used in paractice. ii Loco cA S 

17. (a) What are the components of a Time Seri i 

т ц д eries ? Explain 

decomposing such a series, Which are the periodic and which ate he ined 


components ? 
(6) In what way is a seasonal index helpful to a business executive ? 


Illustrate your answer with suitable examples, 
(c) Explain the method of moving average. How is it used in measuring 


trend in an analysis of a time series, 


its uses, 
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18. (a) Why do we deseasonalize data? Explain the ratio to moving 
average method to compute the seasonal index. 
(6) Explain the following : 
the business analyst who- uses moving averages to smoothen his 
data, while in the process of trying to discover business cycles, is likely to come 
up with some non-existent cycles," (B.A. Hons. Econ., Delhi, 1974) 
19. (a) State the conditions under which a moving average can be re- 
commended for trend analysis. How willyou determine the period of the 


moving average ? (C.A., 1974) 
(b) Explain the role of the method of moving average in time series 
analysis. (M. Com. Delhi, 1975) 


20. Draw atime series graph rela'ing to the following data and show the 
trend free hand : 


Years ^ Production Years Production Years Production 
of tea of tea of tea 
ү (7000 Ip.) (7000 Ib.) ("000 b.) 
1951 16: 1959 210 1967 366 
1952 17 1960 237 1968 325 
1953 236 1961 203 1969 256 
1954 213 1962 215 1970 304 
1955 180 1963 280 1971 291 
1956 163 1964 351 1972 277 
1957 180 1965 320 1973 274 
1958 187 1966 370 


21. Apply the method of semi-averages for determining trend of the 
following cata and estimate the value for 1970 : 


Years Sales Years Sales 
(thousand units) (thousand units) 

1963 20 1966 30 

1964 24 1967 28 

1965 22 1968 32 


If the actual figure of sales for 1970 is 35,000 units, how do you account 
for the difference between the figure you obtain and the actual figure given to 
you ? (M B.4., Delhi, 1970) 

22. Plot the following data on a graph paper and ascertain trend by the 
method of semi-averages : 


Years Production Years Production 
(million tonnes) (million tonnes) 

1968 100 1972 108 

1969 120 1973 102 

1970 95 

1971 105 


23. Apply the method of semi-averages to depict the long-term tendenc 
of the following data and esiimate the value for 1977 б : 3 


Years Production Years Production 
(million tonnes) (million tonnes) 
1967 40 1971 51 
1963 44 1972 50 
1969 42 1973 54 
1970. 48 
24. Fita straight line trend by the method of least squares to the follow- 


ing data and obtain the trend value for the year 1972 : 


Year 1960 
Production pus 1962 1963 1964 1965 


(Lakh tons) 3'6 38 44 47 56 T3 
an 1966 1967 1968 1969 1970 1971 
(Lakh tons) 71 T6 TT 9:0 90 10:1 
(M. Com., B.H.U.. 1972) 
ise kabet] 
Yi972=10'55 
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25. Production of rice ina district during the last 10 years is given 


below : 
Years Production Years Production 
in tonnes) (in tonnes) 
1960 11,200 1965 14,500 
1961 12,300 1966 11,6C0 
1962 10,600 1967 14,300 
1963 13,400 1968 13,600 
1964 13,800 1969 15,400 
Using 3-yearly moving averages indicate the trend in the production of rice 
(B. Com., Andhra, 1969) 


in the district. 


26. The following series relate to the profits of a commercial concern for 


8 years: 
Years Profits Years Profits 
(Rs.) (Rs.) 
1966 15,420 1970 261120 
1967 14.470 1971 31,950 
1968 15,520 1972 35,470 
1969 21,020 1973 35,670 
Find the trend of profits, Assumea three-year cycle and ignore decimals. 
(B. Com., Osmania) 


purpose. 
to chocse à 4-poin 
moving average’? 


Years Production Years Production Years Production 
(m. tonnes) (m. tonnes) (m. tonnes) 
1951 351 1957 410 196% 5024771 
1952". 366 1958 420 1964 540 
1953 361 1959 450 1965 557 
1954 362 1960 500 1966 571 
1955 400 1961 51% 1967 586 
1956 419 1962 455 1968 612 
(В.А. Hons., Econ., Delhi, 1969) 


28. Find the centered 4-year moving averages for the followirg time- 


series data : 
Years 1960 1961 1962 1963 1964 1965 1966 1967 
Yi 20:1 454 393 41:4 4272 464 466 492 
(I.C. W.4., 1968) 
1962 1963 1964 Each 
40:56 42:20 4324 4%12 


29. The following table gives the data relating to average yield of rice in 
U.P. during 1569-61 and 1970-71 and also the 3-yearly moving averages, Piot 
on graph paper the given data. Compute 5-yearly moving averages and also show 


them on the graph. 
Years Average yield of 3-yearly 
Rice per acre moving aver ge 
(in quintals) (in quintals ) 
1960—61 521 — 
1961—62 438 499 
1962—63 531 5°38 
1963—64 646 5:06 
1964—65 612 6°56 
1965—66 718 6°50 
1966—67 621 6:53 
1967—68 619 67i 
1968— 69 173 6769 
1969— 70 615 T21 


1970—71 794 — 
(B. Com., Lucknow, 1972) 
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39. Compute the trend value by the method of least squares from the 


data given below: 


Yeer 1962 1963 1964 
No, of shecp 
in lakhs 56 55 51 


1966 1967 1968 1969 
42 38 35 32 


(M. Com., Raj., 1969 ; M. Com., Delhi, 1970) 


(¥=44'5—3'714x ; Origin 19655) 


31. Below are given the figures of production of a sugar factory : 


Years Production 
(thousand quintals) 

1968 99 

1959 83 

1970 94 


Years Production 
(thousand quintals) 

1971 99 

1972 92 

1973 110 


Apply the method of Least Squares to determine the trend values. Alsa 
find out the short-term fluctuations. 


(Y29617--2:486x ; Origin 1970°5) 


32. Fita straight line trend by the method of Least Squares. 


Years Milk consumpticn Years Milk consumption. 
(million gallons) (million gallons) 

1960 102'3 1964 1148 

1961 1019 1965 1187 

1962 105*8 1966 124 5 

1963 1120 1967 10279 


(M.A, Econ., Jabalpur, 1973) 


(Y—1107362--1:889x ; Origin 1963:5) 
33. Fita straight line trend by methed of least Squares to the following 


data; 
Years Sales Years Sales 
(n 009 units) (in *000 units) 
1968 100 1971 , 124 
1969 120 1972 136 
1970 118 1973 140 


Also compute the trend va 


К \ lues for the various years, What is the monthly 
increase in the sales ? 


¥=123+7°257x ; Origin=1970'5) 
.. 34. Below ате-віуеп the figures of production (in thousand tons) ofa 
ertilizer factory : 


Years Production Years Production 
1958 70 1963 85 
I an do 1964 91 
9 
a, * 1965 100 


inb (i) Fit a straight line by the least Squares method, and tabulate the trend 
'alues. 


(ii) Eliminate the trend. What components of the time series are thus 
left over ? 
(i) What is the monthly increase in the production of fertilizers ? 


7 ¥=87+4'178x | 
Taking 1962 as origin 
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._ _ 35. The following (able gives the Sterling Assets of the Reserve Bank of 
India in crores of rupees : 
(i) Represent the data graphically. 
(ii) Fit a straight line trend, 
(iii) Show the trend on the graph, 
Years 1966.67 1967-68 1968-69 1969-70 1970-71 1971-72 
Assets 83 92 171 93 169 191 
Also estimate the figure for 1976-77, 
(B. Com., Mysore, 1972) 
f Y-1164 22574? S, 
(Yiore-12= 285275 crores J 
fen 36. Fita straight line trend by the method of least squares to the follow- 
ing data : 
Years 1 2 3 4 5 6 7 8 9 10 
Size of item 110) $125) (11S 13S. 150-.165 15519175. | 180'. 200 
(M. AJ, Rajasthan, 1969) 
(Y¥=151+ 9515 X), 


37. The production of pig iron during 1967-73 is given below : 


Years Production Years Production 
(lakhs of tonnes) (lakhs of tonnes) 

1967 48 1971 45 

1968 50 1972 41 

1969 58 ы 1973 49 

1970 52 / 


(i) Fita straight line trend to these figures. 


" ine on a graph. 
(ii) Show the trend line on a grap! (B. Com., Mysore) 
(Y-249—2 oriyin 1970) 
38. Thefollowing are the annual profits in thousands of rupees in a 
certain business : 


Years 1961 1962 1963 1964 1965 1966 1967 
. Profits 
(in e 

ands о! 

rupees) 60 72 75 65 80 85 95 


(a) Use the method of least squares to fit a straight line t» the above data. 


(b) Plot the above figures and draw the line. 


ts for the year 1976. 
(c) Also make an estimate of the profits dol ау 199) 


[Ү=76+4:857х 
` ( Утв 71347284 th. rupees 
39: Fita straight line trend by the meth od of least squares to the follow- 
ing data relating to the net prosts of a public concern : 


Years Profits 
ка deton (Rs '000) 
1960 300 1964 очы 
1961 700 1965 100 
1962 60 1966 D 
1963 800 


[B. Com., Osmania, 1967) 
(Y27147286--857714x) 
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40. From the data given helow determine the equation in the line of 
trend (assuming to be linear) and hence find the expected value for 1966 ; 

Years Production Years Production 

(їп tonnes) (in tonnes) 

1956 1102 1961 550 

1957 1433 1962 740 

1958 1438 1963 129-0 

1959 1345 1964 15070 

1960 138:0 1965 140'0 


(B. Com., Madras, 1967) 
[Y¥=121°78-—0°143X ; Үе = 120994) 


4l, Estimate, by means of method of least squares, the equation for 
trend, the increase in the number of telephones in India. 


REL de 1965-65 1966-67 1967-€8 1968-69 1969-70 1970-71 1971-72 


telephones 
installed in 


thousands 
(Y) 278 309 335 378 424 481 521 
, (Clearly specify the origin and the units of the variables in the trend 
equation obtained.) (M. Com., Calcutta) 


( Y=38943 +4 BSX 1 
L Taking middle year as origin ) 


42. Fit an equation of the type Y—a--bX--cX* to the following data : 


Years Production Years Production 
(in '00) tonnes) (in "C00 tonnes) 
1968 70 1971 80 
1969 72 972 90 
1970 88 Me 


‚ Y-(81143--4'8X— 511X?) 


43. Fita parabola of the second order to the following data : 


Years Sales Years Sales 
(it million tonnes) (in million tonnes) 

1960 100 1963 100 

1961 105 1964 112 

1962 ИКА 1965 118 


(У=107:656-+2°743Х-'232Х° ; Origin=1962'5) 
44, The population of a State at ten yearly intervals is given below : 


Years Population Years i 
x (in millions) т i = grea 
y » 
1901 39 1941 е 
Г 129 
Bu з 1951 mi 
i ; 1961 232 
3 9:6 1971 305 


Ey fitting a curve of the form y=ab? to this data estimate i 
7 the populatio: 
for 1981. (Estimated population for 1981 41-37 million р 
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45, The following table gives cash receipts from farm marketings : 


Years 


Jan. 
Feb. 
March 
April 
May 
June 
July 
Aug. 
Sent, 
Oct. 
Nov. 
Dec. 


1970 
513 
274 
273 
224 
328 
29:7 
323 
341 
477 
76:0 
71 
555 


Obtain the seasonal variations. 


46. The following table shows the number of lett 
area during a typical perio. 
the period remains the same, calcu 


' of four weeks. 


percentages of the grand average. 


1971 
615 
263 
241 

2r4 
298 
28:9 
3270 
29:8 
617 
82:8 
558 
638 


Assuming th: 


1972 
55:9 
284 
21°5 
231 
270 
253 
267 
28'6 
5r6 
747 
57:9 
58*5 


(B. Com., Bombay, 1972) 


ers posted in a particular 
ning that the trend value durirg 
late ‘seasonal indices’ (here daily indices) as 


Weeks Sun. "Mon. Tue. Wed. Thus. — Fri. Sat. Total 
1 18 161 170 164 153 Ist 76 923 
2 18 165 169 147 155 190 80 927 
3 21 162 169 153 145 190 82 922 
4 20 165 170 155 150 180 85 925 

Total 7? 653 678 619 606 741 323 3,697 


.C.W.A., Jan., 1969) 


47. Compute the average seasonal movement in the fcllowing series + 


QUARTERLY DEATHS (thousand persons) 


1 
1968 3:5 
1969 35 
1970 33 
1971 40 
1972 41 


48. The following table gives quarterly expenditure data суе 


IIT 


dU Uu 
Noair 


years. Obtain seasonal corrections for the data : 


Quarter 
I 


Il 
n 
IV 


49. Calculate the Seasonal indices 
data in certain units. You may use any m 


1960 
1961 
1962 
1963 


1970 
78 


Qı 


1971 1972 
84 92 
64 70 
61 63 
82 83 


Qe 
21 
23 
26 
23 


Оз 


IV 


Qi 


r a number of 


in the case of the following quarterly 
ethod you think appropriate : 


(M. Com., Calcutta, 1967) 
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50. Compute the average seasonal movement for the following data rela- 
ting to the revenue expenditure, Government of India : 
REVENUE EXPENDITURE (Rs, CRORES) 


(—— Quarters. AES 

April- July Oct.- Jane 

Years June Sept. Dec, March 
1963-64 36 43 44 102 
1964-65 39 44 37 98 
1965-66 47 53 58 104 
1966-67 47 56 60 130 


(АСИ. A., July, 1967) 


51. Find trend values (mixed with cyclical movements, if any) from the 
following data of output, by the method of moving averges : 


Years s 
Quarters 1965 1966 1967 1968 
I 29 40 47 45 
I 37 42 51 49 
Ht 43 55 63 60 
n" 34 43 53 48 


52. Find out seasonal index from the following table ; 


Seasons 1970 1971 1972 1973 1974 
Ist Quarter 40 42 41 45 44 
2nd  ,, 35 37 35 36 38 
Зра. s 38 39 38 36 38 
4h „ 40 38 42 41 42 


(В. Com., Agra, 1975) 
53. Obtain seasonal fluctuations from the following time series : 


QUARTERLY OUTPUT OF COAL FOR YEARS 


Quarters 
Years 1 Ш Hl IV 
1965 65 58 56 61 
1966 98 6з 63 61 
1967 70 59 56 52 
1968 60 55 51 58 


(LCW.A., Jan., 1969) 


; 54. Calculate five-yearly moving averages of number of students studying 
in à commerce college as shown by the foliowing figures : 


Year 1961 1962 1963 1964 1965 

No, of students 332 317 357 392 402 

Year 1966 1967 1963 1969 1970 

No. of students .. 405 410 427 405 438 

(B. Com., Bombay, 1970) 

[ Years 1963 1964 1965 1966 1967 1968) 
| 5-yearly ] 
| moving 


{ Average 3600 374°6 393-2 407-2 409°8 ато) 
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55. . The number of units ofa product exported during 1960-67 is given 
below. Fit a straight line trend to the data. Plot the given data showing also 
the trend line, Find an estimate for the year 1968, 


Year 1960 1961 1962 1963 1964 1965 1966 1967 
No. of 

units in 

7000 12 13 13 16 19 23 21 23 


(B. Com., Bombay, 1970) 1 
(Y—1T:5--1:786x ; Origin 196355 3 Y1908—25:537) 
А 56. The population of Tamil Nadu at the (successive census) years is 
given below : 
Year x i911 1921 1931 1941 1951 1961 1971 
Population y 
(in lakhs) 193 209 216 235 263 301 337 
Fit a curve of the forin y —abz, 
(B.Sc., Madras, 1972) 
57. Give an account of the method of least squares in fittinz a straight 


line trend to a time series. Fita straight line trend to the following data on the 
domestic demand for motor fuel. 


Year 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 
Average 
Monthly 
demand 
{million 
barrels) 6l 66 72 6 82 90 96 100 103 100 114 
„ (B.A., Madurai, 1970) ; 
(Y—88:18 $-4igx) ` 
58. Fita straight line trend to the data given below bythe method of 
least squares and present the Original data and trend values on a graph paper : 


Year 1959 1960 1961 1962 1963 1964 1965 


Gi -Factory value 
ош (Rs. dO) 672 824 968 1205 1464 1758 2057 


(B. Com., Delhi, 1971) 
(Y—1278:294-232:82x) 
$ lar Trend’? Discuss at least 
9! (а) What do you understand by ‘Secu! 
one SUN БАРЫ for estimating trend. (M.B.A., Delhi, 1971) 
i i i into which a time series may 
(b) Discuss briefly the different components in! i t е 
i ating trend values in a time series, 
by analysed. Explain any method of isolating tr VU preside: BUM, 1972) 


i i i ies’. Discuss the various methods of 
PRI ack ter he me ane ace (М.В.А., part-time, 1972) 


60. (a) Compare the ‘ratio to moving average’ and the ‘ratio to trend’ as 
methods of analysing seasonal variations. 


(5) (i) Given the trend equation 
Y,-35 T5X43X? 


SMRE-—10'77-9 
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where 1968-0 
and X-unit=1 year. 
Change the origin of the equation to 1974. 
(i) Given the equation 
¥,=10(1'5)", 
where 1968=0 
and X-unit=1 year. 
Shift the origin forward by two years. 


.. (iii) The trend of the annual sales of the. Bharat Aluminium Company is 
described by the following equation : 


¥,=12+0'7X, 
where 1970=0 
X-unit=1 year 
and Y-unit—annual production. 


Step the equation down to a month-to-month basi. and shift the origin to 
January, 1970, (M. Com., Delhi, 1972) 


61. Cempute the trend by four-week moving average for the following data: 


Week 1 2: 3 4 5 6 7 8 9 10 
Production Ва: a o СА RS 1735 039. 777155985 74 75 
Week Al..12 13 14 15 16 17 18 19 20 


Production AS4- 3005/13 TOE ANDRES ДЕ? TO AYER 79 
(B. Com., Bombay, 1971) 


(¥=65'75+-7°333x) 
62. Fit a straight line trend by the method of least squares to the follow- 


ing data, Assuming that the same rate of change continues, what would be the 
predicted earnings for the year 1972 ? 
Year Earnings Years Earnings 
(lakhs of Rs.) (lakhs of Rs.) 
1963 38 1967 69 
1964 40 1968 60 
1965 65 1969 87 
1966 72 1970 95 
[Y«6575--7333x ; Ү,,12=106:082] (B. Com., Delhi, 1972) 
63. The following data give the total expenditure incurred bya certain 
college during the respective years : 
Years 1967 1968 1969 1970 1971 1972 
Expenditure 
(Rs. in lakhs) 1°5 18 20 23 24 2:6 


kstimate the expenditure likely to эзе incurred by the above college during 
the year 1973, (B. Com/, Poona, 1973) 
- (Y-22140207x ; Yy3-2:96 
64. Fitatrend line by the method of four-yearly moving averages to the 
following time series data. Depict the trend line graphically : 
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1954 1955 1956 1957 1958 1959 1960 1961 1962 


6 7 7 6 8 9 10 9 


Year 
Production 
of Sugar Ж 
(Lakhs of tons) 
Yeat _ 
Production 
of Sugar 
(Lakhs of tons) 10 11 11 
(В.А. Hons. Econ., Delhi, 1973) 


1963 1964 1965 


65. Compute 3-yearly moving averages for the foilowing data giving 
yearly sales (in thousand 1upees) of a firm for 10 years. 

1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 

8 9 8 10 9 12 13 14 12 15 


(B. Com., Poona, 1973) 


66, A store has an annual trend equation for its sales index at 


Yis—100--3X 
If its December seasonal is 200 and its July seasonal 60, which was a 
relatively better month for sales, July 1960 when the sales index was at 70 or 
December 1958 when the index was at 220? 


Years 
Sales 


(В А. Hons. Econ., Delhi, 1972) 


67. Compute the trend values forthe following time series using four- 
weekly moving averages : 


Week Production Week Production 
of steel of steel 
(in *00 tonnes) (in *00 tonnes) 
1 82 1 75 
2 73 12 73 
3 74 13 15 
4 75 14 76 
5 73 15 75 
6 72 16 75 
1 76 17 78 
8 76 18 76 
9 74 19 78 
10 15 20 79 


(B. Com., Poona, 1973) 
68. Prepare а monthly seasonal index from the following data, with the 
help of moving average method : 
Monthly sales of X Y Z Products Co. Ltd. (in Rs.) 


Years 
Month. 1969 1970 1971 
Jan. 3,639 3,913 4,393 
Feb. 3,591 3,856 4,530 
March 3,326 3,714 4,287 
April 3,469 3,820 4,405 
May 3,321 3,647 4,024 
fune 3,320 3,458 3,992 
July 3,205 3,476 3,795 
‘Aug. 3,205 3,354 3,492 
Sep. 3,255 3,594 3,571 
Oct. 3,550 3,830 31923 
Nov. 3,771 4,183 3,984 
Dec. 3,772 4,482 3,880 


(B.A, Hons. Econ., Delhi, 1972) 
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69. The following table gives average weekly sales (in thousand rupees) 
foreach quarter during the years 1968-71 for a certain company. Compute the 
seasonal indices for each quarter from this data : 


PS Quarter Average Weekly Sales 
NS 3 cM "à TTA 
» BS I п Ш IV 
‘ear 
1968 13:27 14:32 13:79 14°95 
1969 1514 1618 14:70 1577 
1970 16°03 16:93 15:38 16'55 
1971 16°79 17°63 16°27 17°81 


(B. Com., Bombay) 
70. Fit a straight line trend of type 
Ү=а+ЬХ 
by the method of least squares to the following time series data, Calculate also 
the trend values. 


Yeai 1967 1 1975 > 
Production r 7 1968 1969 1970 1971 1972 1975 1574 19K 
(*000 tonnes) 11 13 15 14 15  .16 16 17 18 
[У=15'0+17:333х1 


ы b Find tbe trend values by the method of least squares and plot them on 


Year 1964 1965 1966 
Production (in crores pet RS ne 
of pounds) 7 10 12 14 17 24 


(B. Com., Kurukshetra, 1975) 
og 7 ~ > (Y¥=14+3 086х ; Origin 1966'S) 
72. Fitatrend line by the method of least squares to the following 


data . 
Years 1970 1971 1972 1973 1974 
Sales (tonnes) 10 12 18 20 25 


(¥,=17+3"8x ; origin 1972) 
73. Fit a trend line by the method of least squares to the following 
data and estimate the saies for 1977 : 


Year 1970 1971 1972 1973 1974 1975 
(Sales in tonnes) 10 12 13 15 20 14 
(Y,—144- 1314 x; origin 1972/5 ; Yre27=19'°913) 
74. Fita second degree polynomial trend curve to the following hypo- 
thetical observations on per capita income : 


Year 1975 1976 1977 1978 1979 
Per capita income 600 620 660 700 750 
___ What value of the series would you forecast for 1980? 
[Y26574283--38X--42848X? ; Yioso=809°991] ` (М.А. Econ., Patiala, 1974) 
75. Production of a certain commodity is given below : 
Years 1969 1970 1971 1972 1973 
Production 


(000 tonnes) 7 9 10 7 5 
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Fit a parabolic curve of 2nd degree to th tion. i 
for 1975 and comment on it. ANT ом. ou MUS 
і Я з i (¥=9'31—0°6X—0°857X?) 
j 76. Fita straight line trend by the method of least squares to the follow- 
ing indices : 
Years 1961 1962 1963 1964 1965 1966 1967 
index No. 127 101 130 132 126 142 137 
(B. Com., Punjabi, 1975) 
IY-9:314—06X—0857x? 3 Yiers=6°798, not possible assumption wrong; 
77. Apply the method of least squares to obtain trend values from the 
following data : 


Years 1969 1970 1971 1972 1973 
Sales (lakh tonnes) 100 120 110 140 80 
(M.B.A Full-time 1975) 

[Y—110—2x; 


origin 1971] 


78. Trend equation for yearly total sales (in "000 Rs.) for a commodity 
with year 1971 as origin is y=816+28'8x. Determine the trend equation to give 


hl d values with Jan. 1972 as origin and calculate the trend value 
Es March ln (M. Com., Delhi, 1975) 


79. (a) What are the various components of a time series ? Analyse how 
these components can be separated from each other when a time series of a 
variable is given. 

(b) In analysing the trend component in a time Series, one may use either 
the *moving average method' or fit a polynomial by the method of least squares. 
Explain the two methods in detail. 

(c) The following table gives the number of motor cars produced in two 

Countries over ten years : 


Year 1 2 3 4 5 6 7 8 9 10 
Country А 96. 74 68 50 99 172 245 302 332 345 
Country B. 254 231 201 172 189 187 166 203 200 202 


Evaluate the trend in two time series by the method of moving averages. 


Clearly explain how you would determine the period of moving averages. 
у ФАР 2. (B. A. Hons., Econ., Delhi, 1975) 


SECTION 15 
INTERPOLATION AND EXTRAPOLATION 


j i i i he assumptions on which 
1. What is meant by interpolation ? What are t 

method s of interpolation are based ? (B. Com., Mysore, 1969) 

in bri i f interpolation and extrapolation in 

2. Explain briefly the usefulness of interp Malo nt a. a 


statistical studics. 
3. Distinguish between interrolation and extrapolation. When does the 


need to interpolate or extrapolate aris? ? 
ы (B. Com., Mysore, 1966; В. Com., Kurukshetra, 1974) 


4. What do yw understand by ‘interpolation’? Show clearly the 


necessity of interpolation by taking a few concrete examples. 
(B. Com., Mysore, 1967) 
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5. Describe any two algebraic methods of interpolation you know of, 
stating clearly the assumptions involved. (B. Com., Madras, 1965) 


6. What is interpolation ? How does it differ from forecasting ? Explain, 
giving examples, how these are useful in commerce and industry. Ў 
(B. Сот , Madras, 1964 ; B. Com., Bangalore, 1968) 


7. State Newton's interpolation formula for equal intervals and the 
assumptions underlying it. (LCW.A., 1968 ; M. Com., Agra, 1969) 


8. State Newton's formula for interpolation and discuss some of its uses. 
Explain why Newton's forward formula is to be used for interpolating values at 
the top of the table, (.C.W.A., July, 1969) 


9. Explain Lagrange's method of interpolation. 
(M. A., Econ., Agra, 1973) 
10. Develop Newton's formula for interpolation tor equal intervals. State 
the assumption underlying it, (M.A., Econ., Delhi, 1963) 


11, What аге the assumptions underlying interpolation of missing figures 
in a series ? (M.A,, Econ., Delhi, 1966) 


12. Comment on the necessity and usefulness of interpolation. Describe 
the graphic method of interpolation. Q.C.W.A., January, 1970) 


13. Find by algebraic interpolation the number for 1970 from the follow- 
ing table of numbers of production of a certain article in India : 


Years 1968 1969 1971 1972 
Index Number 100 107 157 212 
(B. Com., Madras, 1972 ; M. A. Econ., Jabalpur, 1974) 

(124) 


14. Use one of the methods of interpolation to estimate the business done 
in 1970 from the following data : 


Years 1967 1968 1969 1971 1972 
Business done 

(Rs. in lakhs) 150 235 365 525 780 

(B. Com., Madras) 

(447) 


f 15. From the following table find the interpolated figure for the popula- 
tion in 1956 : 


Years 1940 1950 1960 1970 

Population of 

a town 25,974 29,003 32,528 36,070 
(30,733) 


16. The following table relates to the income earned per month by а 
certain number of workers in a big manufacturing concern : 


Earning per month Number of workers 
(in Rs!) 

Upto 10 50 

Up to 20 150 

Up to 30 300 

Up to 40 500 

Up to 50 700 

Up to 60 800 
It is required to find out the number of workers falling within the 
Rs, 25—35 earning group. (M. Com., Agra, 1970) 


[178] 
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‚17, The following are the marks obtained by 492 candidates in a certain 
examination : 


Not more than 40 marks 212 candidates 
Т 42 1 Ss 296 „ 
lee eli 220 Лу 368= „ 
» 8) о 93 ө 49 „ 
КР ЕР Wy LOO 0 . 460 „ 
31713. ME д Sos gs 480 „ 
v» I Берд i cep 490 „ 
анто y 492 „ 


Find the number of candidates who secured more than 42 but not more 
than 45 marks. (M.A, Econ., Agra, 1973) 


18. From the following data, estimate expectation of life at the age of 
22 years, stating clearly the assumption involved in the use of the formula ; 


Age 10 15 20 25 30 35 
Expectation 
of life in (years) 35'4 322 291 260 231 20°4 


(M. Com., Allahabad, 1969 ; І.А.5., 1968 ; B. Com., Mysore, DD 


19. Using any algebraic method extrapolate the population for 1981. 


Years 1931 1941 1951 1961 1971 
Population 2,522 2,514 2,791 3,168 3,613 | 
(in lakhs) 

(4.247) 


" 20. The weekly wages paid to 492 employees of a soap factory are as 
under : 


Rs. Employees Ёз, Employees 
Not more than 40 212 Not more than €0 460 
» » n 4 296 n- 5 6S 481 
xii ЕЗИ MR 368 Coy IARE) 490 
n YE RODA 429 e »' 75 492 
Using Newton's method, interpolate the number of employees who got 
more than Rs, 42 but not more than Ks. 45. (M.A., Rajasthan, 1969 ; 


M.A, Econ,, Meerut, 1973) 
(40) 


21. The population of a city in the decennial census 
Estimate the population for 1965. ve ae maces 


Years Population 
(in thousands) 
1931 46 
1941 66 
1951 8! 
1961 93 
1971 101 


(І.С.И/.А., 1968) 
[99°66 thousands) 
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22. The following table gives the amount y of cement in thousands of 
tonnes manufactured in India in the year x. Find the missing term : 


x У 
1956 39 
1958 85 
1960 ? 
1662 151 
1964 264 
1966 388 


(.C.W.A., July, 1970) 
(964 thousand tonnes] 


23, The following are the amounts of income-tax paid by а few business- 
men during one year : 


Morethan Rs. 500 600 
» »  » 1,000 550 
» » » 1,500 425 
ET 3» » 2,000 275 
» -»  » 2,00 100 

3,000 25 


Find out the number cf businessmen who paid more than Rs. 1,200 but 
not more than Rs. 2,400 as income-tax, (LC.AW., July, Bud 


24. The following table gives the normal weight of babies during the first 
twelve months of life : 


Age in Months 0 2 5 8 10 12 


‘Weight (16.) 7 101 15 16 18 21 
Find the weight of a 7-month old baby. (М.А.. Econ., Delhi, 1967) 


(Hint. Use Lagrange's Method, 15°66 lb ) 


25. The following table gives the normal weight of a baby during the first 
six months of life : 


Age in months 0 2 3 5 6 
Weight in Ib. 5 7 8 10 12 
Estimate the weight of a baby at the age of 4 months. 

(I.C.W.A., Jan., 1970) 
(Use Lagrange’s Method, 8'9 lb.) 


Я 26. The working class cost of living Index Numbers for a certain place are 
given below for certain years. Interpolate the missing number. 


Year Ind. 

1968 LS. No. 
6! 

1370 200 

1971 2 

1972 278 

1973 250 


(В. Com., Mysore) 
(284) 


— 


i 
| 
| 
| 
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27. From the following data estimate the number of persons in the income 
group Rs. 2010 Rs. 25: T i 


Income No. of persons 
Below Rs, 10 20 
» » 20 45 
» » 30 115 
iy » 40 210 
" » 50 325 


(B. Com., Nagpur, 1969) 


f Hint—Apply Newton's Method \ 
Persons between income group | 
Rs.20—25 =31, 


28. From the following data estimate the number of persons earning 
wages between Rs. 60 and 70 : 


Rs, below 40 40—60 60—80 80—100 100—1:0 
No. of persons 
in '000 250 120 100 70 50 
(B. Com., Mysore, Oct., 1969 ; М.А. Econ., Meeruth, 1974) 
(53:6 thonsand) 


29. Interpolate the value of premium to be paid by using the Newton's 
method, when age next birthday is 17 from the following table ; 


Age next birthday in ycars 15 25 35 45 55 
Premium in Rs. 111 126 14:3 161 18:3 
(B. Com., Mysore, April] 1969) 
(Rs, 11°4) 


30. Using Lagrange’s method of interpolation find from the data given 
below the number of agricultural labourers earning between Rs. 30 and Rs. 40, 


Earning in Rs. No. of agricultural 
labourers 
15—20 73 
20-39 97 
30—45 110 
45—55 180 
55—70 140 


63) 
31. If Z, represents the numbers living at age x in life table, interpolate 
by using Newton’s method / for the rule of x=35 


o L39— 439, L49—346 

Lso=2 

b (М. A., Econ., Punjab, 1970) 
` (394) 


32. Estimete the probable number of lecturers earning between Rs. 400 
а d Rs, 425 from the following data : 


Income Rs. No. of lecturers 
300—350 120 
350 - 400 145 
400— 450 206 
4*0—500 250 


500—550 150 
! (В. Com., Nagpur, 1971) 
(90) 


(M.A., Econ., Punjab, 1973) - 
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4 33. Estimate the number of men of 34 years of age from the following 
ata : 


Age in years 20 30 40 50 60 70 
Number of men 600 530 525 275 100 25 


(M. A., Econ., Agra, 1971) 


hod of interpolation find fiom the data given 
г of factories earning less than Rs. 45,COu as profits ; 


34. Using Newton's met 
below the numbe 


Profits (Rs '000) 30—40 40—50 50—60 60—70 70—80 
No. of factories 31 42 51 35 31 


(M A., Econ., Punjab, 1969) 


(48) 
" 35. Interpolate the missing figure in the following table with the help of a 
suitable formula : 


1961 1962 1963 1964 1965 1966 1967 
1,331 1,728 2,197 ? 3,375 4,096 4,913 
(B. Com., Nagpur, 1972) 

(2744) 


estimate the railway rate 


Given the following railway fare Schedule, 
m Meerut to Mysore — a distance of 2,650 kilo- 


36. 
per kilometre for a jouraey fto. 
metres. 


Distance (km.) <00 1,000 1,500 2,000 2,500 3,000 
Railway Fare ( Ёз.) 50:5 90:5 1285 1000 1900 2200 


[199724 ; 0°0752] 


7. From the following ‘table showing the nur 


(M A., Econ., Meerut, 1971) 
37, 
interpolate the number of visitors in 1970 by using La 


mber of visitors toa film, 
grange's formula : 


Years 1968 1969 1971 1972 
Million visitors 5,112 6,514 9,069 9,685 


(M.A. Econ.,Punjab, 1973) 


38. Using Lagrange's formula, estimate from the following data the 
number of workers getting not exceeding Rs. 26 per month ; 
Income not exceeding (Rs.) : 15 25 30 


35 
No. of Workers : 36 40 45 


48 
(B. Com., Punjabi, 1975) 


SECTION 16 
VITAL STATISTICS 


1. What is meant by ‘Vital Statistics’ ? How are such statistics collected 
and for what purpose are they used ? (B. Cem., Andhra, 1970) 


2. (a) Distinguish between crude and specific birth and death rates. 
How are they computed? What are their uses ? 


(B.A. Hons. Econ., Delhi, 1975) 


(b) Define gross and net reproduction rate. How are they computed ? 
What interpretation can be made if the net Teproduction rate is ], less than 1, 
greater than 1? (M.Com. Business Statistics, 1974) 


VITAL STATISTICS R-139 


3. Write short explanatory notes оп: 
(i) Expettation of Life 
(ii) Standardized Death Rate 
(iii) Gross Reproduction Rate 


(iv) Net Production Rate. 
(М Com., Gorakhpur, 1969) 


4. Explain the concépts of gross and net reproduction rates. Discuss the 
steps for estimating the net reproduction rate, 
(B. A. Hons., Econ., Delhi, 1972) 


5. Distinguish between crude death rate and standardized death rate. 
How are the two calculated ? Whar is the advantage of using standardized death 
rates ? (B. A. Hons., Econ., Delhi, 1971 ; B. Com., Poona, 1973) 


6. What is meant by vital statistics ? How are these statistics used for 
making population projections ? (M. Com., Delhi, 1971) 
7. (a) Write short notes on : 


(i) Net Reproduction Rate 
(ii) Life Table. (M. Com., Delhi, 1971) 


(b) Distinguish between general fertility rate and gross reproduction rate. 

(B.A. Hons. Econ., Delhi, 1974) 

8. Calculate the crude and standard death-rates from the following 

data : 


Age group Population No. of Standard 

(000) deaths age 
(Distributions 

per 1,000) 
0—9 21 350 221 
10—24 36 192 298 
25—44 37 229 285 
45—64 17 354 149 
65 and over 5 415 47 


i C.D.R. —13:18, S.D.R.—13:46] 
9. The following table shows the number of women in age groups and the 
number of female children born ip one year : 


Age group No. of women No. of female children 

rn 

(7000) (7000) 

15—19 363 1:95 
20—24 367 14:34 
25—29 354 2516 
30—34 324 2152 
35—39 288 13°76 
40—44 25% т 532 
45—49 225 046 


Compute for each age group the number of female children born pcr 
annum per 1.000 мотеп—ѓ e., the fertility rate. 

Assuming that the fertility rate is the same at each year of age in each 
group and is unchanged over a generation and assuming по death, compute the 
average number of female children born to a woman during the reproductive 
period (ayes 15—50). [1264] 
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10. Complete the following table and compute the crude and the standar- 
dized death rates : 


| Age | No. of 
| | PRSE M 
| Population | yo of | Specific ш pedis 
| Age age S rate per p 
Deaths. 1,000 standard | 1,000 
ЕТЕ ri 1,000 age 
group 
тт 2 Eu 
110—4 2,110 30 A 59 ne 
| | 
|| 5-14 3,340 6 E 109 | -— 
| 
Males 4| 15—34 7,320 16 — 177 | — 
| 35—59 7,9€0 70 = 121 = 
| 60 and 3,240 196 — 34 = 
|| over 
L 
f 0— 4 2,010 27 — 55 | Gam 
| 
|| 5—14| 3230 8 ОО - 
| 
Females 4 | 15—34 7,319 20 — | 180 | — 
| 35—59 8,750 57 — 122 — 
| 
|| 60 and 4,280 230 — 41 — 
l| over 
Total | 49,550 | 660 С 1,000 


[C.D.R.—13:32 ; S.D.R.« 903] 


11. Distinguish between crude rates and corrected rates, 


Calculate the crude and standardised death rates of the lccal population 
from the following data and compare them with the crude rate of the standard 
population. What inference do you draw from the comparison ? 


Age group Standard Population Local Population 
years Population Deaths Population Deaths 
0— 10 600 18 400 16 
10— 20 1,000 5 1,500 6 
21— 60 3,000 24 2,400 21 
60—100 400 20 700 21 


(B. Com , Bombay, 1968) 


12. (а) Distingu 


ishbetween crude rates and corrected rate i 
reference to death rates. eee PEST 
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, (b) From the following data calculate the crude accident rates of the two 
factories A and B and also the corrected accident rate of B talwng A as the 
standard. Comment upon your results. 


$ Factory A Factory B 
Experience 
in i 
years No. of No: of No. of No. of 
workers accidents workers accidents 

Under 100 40 | | 30 

5—15 1,500 150 560 40 

! 

15—25 | 850 37 400 24 
25 and over 50 100 6 

Total | 2.500 335 2,C00 870 


(B. Com., Bombay, 1972) 


13. What do you understand by Crude Birth Rate? Is it an accurate 
measure of the oopulation growth ofa locality ? If not, how can it he modified 
to give better results ? (M. Com., Delhi, 1971) 


14. What are crude and standardized rates? Why is comparison on the 
basis of standardized rates тоге reliable ? 


Calculate the crude and standardized death rates from the following data : 


ulation Death rate Standard age 
Age 8108 TS per 11000 distribution 
600 
0 -10 400 40 
10—20 1,500 4 i w 
29—60 24400 10 OQ 
60 and over 700 30 


(B. Com., Bombay, 1969) 


15. From the data given below calculate the gross reproduction rate 
assuming that for the given population th: ratio of female babies to total birth is 


48:896 : 

Age group 16—20 21-25 26-30 31—35 36-40 41—45 46—50 
Fertility rate 

per 1,000 women 19 170 253 201 157 67 9 


(M. Com., Delhi, 1968) 
[G.R.R. —2:145 per woman] 
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16. Explain clearly the method of computing the figures in columns (4), 
(5) and (6) of the following table : 


Age group Population No. of Specific Standard Computed 
(years) deaths in deathrate аве dis- deaths per 
one year рет 1,000 tribution 4,000 of 
per 1,000 totol 


population 
(1) Q) (3) (4) (5) e 

0-1 40,500 3,234 79:88 25:5 204 
1—19 7,04,000 1,960 2-78 398:0 T! 
20—39 5,14,900 2,251 437 269-6 118 
40—59 2,56,600 2,965 11°56 1923 222 
60 and over — 82.800 5,400 60-10 1146 6:89 
Total 16,053,800 15,810 OBS 1,0090 13°44 

(.A.S., 1968) 


17. Compute the gross reproduction rate for the following data. What 
additionallinformation, if any, would you need for calculating the net Teproduc- 
tion rate ? 


Age group Total Female 


No. of Female No of 

Population rt Survivors 
15—20 2,89, 148 9,244 7,432 
20—25 3,08,466 51,264 48,834 
25—30 3100.890 56'077 41,230 
30—35 3 00,569 39,405 28,015 
35—40 2.215.637 20,418 13.624 
40—45 2,38,286 5,572 3,342 
45—50 2,29,345 406 E 204 

(1.4.$., 1969) 


18. From the following information calculate standardized death rate 
for countries 4 and Б Н 


Age group Death rate per 1,000 Standard 

Country А Country B Population 

(їл lakhs) 
0— 4 18*870 4:348 120 
5-14 0°759 0 465 200 
15—24 1:385 0:767 183 
25—34 2:048 1:075 175 
35—44 3°326 1882 120 
45—54 7006 4669 90 
55—64 18°111 12477 7 
65— 74 457195 34.060 31 
75 & above 12'258 116433 10 


(В.А. Hons. Econ. Delhi, 1971y 
19. From the information given below, estimate both t 
reproduction rates in а country ; a " аз ast a 
(i) Proportion of live female births to total live births is 4, 


(i) Tahle showing 


the expected sarvivals and ific fertili: 's 
000 newly horn baby girls. s and specific fertility rates for 
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Exact age Expected Survival Specific 
in years at the age given fertility rate 
in Col. I per annum 
per 1,000 
(1) STA (2) TN (3) 
0 1oro = 
15 974 4417 
20 972 220:46 
25 969 216:23 
30 966 12716 
35 961 62:79 
40 953 18:28 
45 942 132 
50 924 ae 


(B.A. Hons. Econ. Delhi, 1972) 


20. Distinguish between crude and standardized death rates. Why should 
the standardized estimates be preferred over crude ones? Estimate the standardized 
death rates for the following two countries : 


Age group Death rate per 1000 Standardized 
(in years) Country 1 Country II Population 
in Lakhs 
0— 4 £0 00 5:00 100 
5—14 100 2°00 200 
15—24 140 гоо 190 
25—34 2-00 1:00 180 
35—44 3°30 2:00 120 
45—54 700 5'00 10) 
55- 64 15:00 12-00 70 
65—74 40:00 35:00 30 
75 & above 120°00 110’00 10 


(B.A., Hons.. Econ., Delhi, 1973) 


21. (a) Discuss how the use of standardized deaih rate makes it possible 
{о make comparison between two towns. (B.A. Hons. Econ., Delhi, 1975) 


(b) “Ву whatever method standardized death rate may be calculated, it is 
а hypothetical vali with a specific use”. Explain. (M. Com., Delhi, 1975) 


22. Compute (i) the gross reproduction rate, and (ii) the net reproduction. 
Tate from the data given below : 


Age Number of Number of female Number of total Survival 
Gr.) women births to women births to women rate 
(000) in the age group in the age group 
15—19 160 140 260 0:969 
20—24 164 1,130 2,244 0:967 
25—29 158 980 1,894 0:963 
30—34 152 670 1,320 0:958 
35-39 148 460 916 0952 
40—44 150 150 280 0:942 
45—49 14:5 80 145 0:928 


(B.A. Hons. Econ., Delhi, 1974) 


Me xí . 4,40 0125, dı1=80 and 7;,—316790: 
In a certain life table 2108.000, 90 for the ages 10 and 11 years 


23. 
Calculate the values for Ж, dz, Pr, qz. Ёз» Dr and c^» (M. Com., Delhi, 1975): 
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SECTION 17 
INTERPRETATION OF DATA 


1. What is meant by interpretation of statistical data ? What precautions 
are to be taken while interpretirg the data ? (B. Com., Maratkwada, 1969) 


2. Account for the wrong interpretations from statistical data and state 
the precautions nece-sary for avoiding them. (C.A. May, 1966) 


3. What do you understand by interpretation of data? Illustrate the 
types of mistakes that frequently occur in interpretation of economic statistics, 
and suggest necessary precautions to avoid them. (M.A. Econ. Punjab, 1968) 


4, Point out the ambiguity or mistake found in the following statements 
made on the basis of the facts given : 


(i) 8095 of the people who die of cancer are found to be smokers and so 
it тэу be concluded that smoking causes can.er. 


(ii) The gross profit to sales ratio of a company was 15% in the year 1964 
and was 10% in 1965. So an aud tor concludes that stock must have been under- 
valued, 


(iii) The average output ina factory was 2,500 in January and 2,400 in 
February 1965 and so workers were more efficient in January. (С.4. May, 1965) 


(iv) Rate for a certain commodity in the first week is 8 kgs. for a rupee 
and in the second week 12 kgs. fora rupee. So the average price is (8+12)/2=10 
kgs, for a rupee, 


(v) The rate of increase in the number of buffaloes in India is greater than 
that of the population. Sothe people of India are now getting more mi К per 
head. 


(vi) The increase in the price of a commodity was 25%. Then the price 
decreased by 20% and again increased by 10%. So the resultant increase іп the 
price was 1596. 


f (vii) According to the estimates of an economist, the per capita national 
income of India for 1931-32 was Rs. 65, The National Income Committee esti- 
mated the correspordirg figure for 1948-49 as Rs. 225. Hence in 1948-49 Indians 
were really 4 times as prosperous as in 1931-32. 


(viii) A man travels 50 miles at a speed of 20 miles per hour and then 
returns at a speed of 30 miles per hour. So the average speed for the whole 


30 


A . 20. ч 
journey is— i » їе, 25 miles per hour. (B.A., Hons., Econ., Delhi, 1969) 


(ix) The rate of dividend of a company increased from 10% in 1969-70 
aR in 1970-71, Hence the company has made 1} time more profits in 


0 The net profit of a company increased from Rs. 50,000 in 1969-70 to 
Rs. 60,000 in 1970-71. Hence the company has become mote efficient. 


(zi) The price of а commodity increased from Rs. 20 іп 19€0 to Rs. 80 
in 1970. Hence the price of the commodity has increased four times, ie., 4.0 
per cent. 


(xii) The death rate has declined from 20 in 1969-70 to 18 in 1970-71. 
Hence people are getting better medical facilities. 
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25. What precautions are necessary in the interpretation of economic 
statistics ? Ilustrate your answer with reference to the following statistics relating 
to Indian national and per capita income : 


Year National Income Per capita Income 
= (Rs. abja) (in Rs.) 
1960—61 95:3 88:5 265:2 2463 
1965 — 66 99*8 104-8 2606 273°6 
1969—70 128°4 1176 3184 291:6 


(М.А., Rajasthan, 1973) 


6. Interpret the following data relating to Indian prices : 


Base Nov. Nov. Nov, 
1962 1963 1964 
Wholesale prices : 
(а) All Commodities 1952-53=100 1301 1347 1568 
(b) Food Articles A =100 1296 136°6 166 3 
Consumer prices 1949=100 1340 1380 163 
(All-India) 
Security prices 1952-53=100 1705 174-5 163:5 


(Variable dividend) 
(M. A. Econ., Punjab, 1968) 


7. Givean interpretative note on the following figures relating to Indiam 
national income : 
Net National product (in Rs. abja) 1950-51 1955-56 19€0-61 1964 65 
(i) at current prices 953 998 141-4 2001 
(ii) at 1948-49 prices 88:5 1048 1273 1505 


Per Capita Income (in Rs.) 
(i) at current prices 266°5 255:0 3257 421:5 


(ii) at 1947-49 prices 24T5 2678 2932 3170. 
(M.A. Econ., Punjab, 1969) 


8. Comment on the following : j 
f i hing the age of 70 
90% of the people who take beer die before reaching g 
sears, аерге bad for long life. 


i ination result of x school was 80 per cent in the year 1971. In 
(ii) Examinati the same examination only 800 cut of 1,200 students were 


nd at " i 
ЫШЫ "ишсе ie teaching Standard of school xs bettet than y 


(iii) The imports of foodgrains in India are increasing—hence the produc- 
tion of toodgrains in India is going down. 
' (руу The population of India has doubled during the last 10 years. Hence 
the birih rate has also doubled. 


SMRE—10'77-10 
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(v) Intelligent fathers have intelligent sons and intelligent grand-father 
have intelligent grand-sons, therefore, intelligence is hereditary. 


(vi) In 1971 the death rate in an industrial town was 14 per thousanc 
whereas in another city the death rate in t! e same year was 15 per thousand. 
Hence compared to city life, industrial town is healhier. 


(vii) ^ vast majority of studen's in a hos:el spend Rs. 200 per head рег 
month. Therefore, the total monthly expenditure of 160 students of a hostel was 
Rs. 20}000. 


(viii) A merchant receives usually 50 customers in a day. Therefore, the 
total nu';ber of customers received by him during the month is 1,500. 


, 9, 'Siatistical analysis properly conducted is a delicate dissection ef 


uncertainties, a surgery of suppositions." Explain the above statement and com- 
ment on the nature of statistical analysis. (М.А. Econ., Punjab, 1967) 
$ 


10. Commeat on the following arguments. 
(i) Since 325 out of a rancom sample of 600 persons in Lucknow are 
found to be smokers, therefore majority of men in Lucknow are smokers. 


(ii) If two coins are thrown up together. either they will come down alike 
Or different, therefore the chances are equal or the chance of a head or a tail 
is 1/2, 


(iii) Lf two coins are thrown up together there are three possible results : 
two heads or two tails or a head and a tail. Therefore the chance of a head and 
а tail is 1/2. (M.A. Econ., Lucknow, 1963) 


QUE c (а). Enumerate ihe chief causes for wrong interpretations from 
statistical data. 


(0) Point out the mistake or ambiguity in the following statements : 


\ „12. (i) The average cost of production was Rs. 1:50 in 1960 and Rs, 1:75 
$n 1951, and so the factory has become inefficient. 


(ii) A person goes from X to Y on cycle at 20 m.p.h, and returns at 
24m.p.h. His average speed was 22 m.p.h. 


{ (iii) In a factory thestandard deviation of wages was Rs.8. After one 
year, the standard deviation was Rs. 10. Wages have become more variable. 
13. (a) State the various reasons why errors are made in the interpretation 
of statistical data, and indicate how to guard against such errors. 
(b) Point out the mistakes or wrong inferences in the following state- 
ments: 
(i) The per capita national ‘income in India increased from Rs. 300 in 
1950 to Rs, 900 in 1970 and so the prosperity of Indians has increased 3 times, 
(ii) Since we had more than double the normal rainfall during the current 
year, we can expect a record yield in rice. M 


(iii) You are given below the following details relating to the wages in 
respect of two factories from which it is concluded that the skewness and vari- 
ability are the same in both the factories. 


Factory A Factory B 
(Rs.) (Rs.) 
A 50 45 
Mode 45 50 
Variance 100 100 


(C.4., 1973} 


Vol. 
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SECTION I 
PROBABILITY j 
1. (a) Define probability and explain the importance of this concept in 
Statistics. (M. Com.. Delhi, 1971) 


(b) What are the different schools of thought on the interpretation. of 
‘probability’? How does each school define probability ? Explain with suitable 


examples. (M. Com., Delhi, 1975) 
2. Define the probability of the occurrence ofanevent. State and prove 
the addition theorem of probability. (M.A, Econ.,Delthi, 1967) 


3. Explain what do you understand by the term probability. State and 
prove tbe addition and multiplication theorems of probability. 
(M.A. Econ.,Delhi, 1966) 
4, Exolain the concepts of independent and mutually exclusive events in 
probability. State the theorem of total and compound probability. i 
(1.C.W.A., 1965) 
5. When are two events said to be independent in the probability 
sense? Give examples of dependent and independent event . 
(M.A. Econ., Delhi, 1968) 
6. Explain the terms “mutually exclusive” and ‘independent events”, 
Show that the chance that the two independent events happen is the product of 
their chances happening separately. How does this get altered when they are not 
independent ? 
vim Fe (a) What are the various schools of thought onthe concept of pro- 
bability ? Explain giving suitable examples. 
(b) Differentiate between the circumstances when the probabilitics of two 


events are 
(ij added, and у 
(ii) muluplied. (M. Com., Gorakhpur, 1966) 
8. (a) What is meant by compound event in probability? Prove that 
P(A+B)=P(A)+P(B) -Р(АВ). 
(Mathematical Statistics, Delhi, 1968) 
( j Discuss briefly Baye’s theorem on conditional probability. 
/ (М. А. Econ.,Patiala, 1974) 
card is drawn at random from a pack of cards, What is the proba- 
bility that it is either a “heart” or the queen of “spades”? ? У [14/52] 
J A, В, C and D in order toss a coin. The first one to throw а head 
wins. batare their respective chances of winning? [8/15, 4/15. 2/15, 1/15] 


1. А вав contains 3 red, 4 black and 2 white balls. What is the proba- 
bility of drawing а red and a white ball, each ball being put back after it is d 


à [1/27] 
_ What is the chance that a leap year, selected at random, will contain 
53 Sundays ? (M. Sc., Agra, o 


wo cards are drawn from a well-shofiled pack of 5? cards. Find the 

that they are both aces if the first card is (2) replaced. iby not replaced. 

[(а) 1869, (b) 1/221] 

4. A six-faced die isso biased that it is twice as likely to show ‘an even 

number as an odd number when thrown. н is thrown twice. What is the proba- 

pility that the sum of the two numbers thrown is even ?  (B.Com., Delhi, 1359) 
i L 


proba ilij 
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Gs) A person throws two dice together. Find the chance that the sum of 

the dois appearing will be 8. (M. Com., р. xe 
J p= 

чё A bag contains 5 white and 4 black balls. They are drawn out one 
by one. Find the chance that the balis drawn te alternately white and black. 


[1/126 

uon coin is tossed thrice. What is the chance of getting all heads ? 
(M. Com., Delhi, 1967) 
[1/8 


Í&) A chain is made of 10links. The tensile strength of each link 's 
equally-fíkely to have any value between 60 and 75 Ib. What іх the probability 


that the chain will break under a load ^f 65 Ib? (M.A. Econ., Delhi о) 
19. Three coins аге tossed simultancously. Whatis the probability thet 
they will fall two heads and one tail? (M. Com., Delhi DAE 


, Froman urn containing 6 white and 4 black balls, 3 ba'ls are drawn 
atrandom. Find the probab lity that 2 are white and 1 is black: 


(i) if each bail is returned before the next 1s drawn, 


(ii) 1f the three balls are drawn successively without replacement. 
vi 54/125, (il) 1/2] 
ЭТ, Urn A contains 5 red balls and 5 black balls, urn B contains 4 red 
balls and 8 black balls, and шп C contains 3 red balls -nd 6 black balls. А ball is 
Чга»п from 4,cofour unknown and putinto B. Then a ballis drawn trom B, 
colour unknown and put into С. What is the probability that a bal) now drawn 
from C will be red ? 187/260] 


22. vay Av hat is the probability that a vowel selected at random in an 
English book is an i ? 


4 Inasingle throw with two dice, find the chances of throwing (i) 8 
and (ii) 

f Four cards are drawn without replacement, What is the probability 
that they аге all aces ? (M. Com., Delhi, 1970) 


L(a) p —1/26, (b) (i) p—5/36 and (ii) p=1/18, (с) 11270725) 
А сап hita target 3 times in 5 snots, В 2 times in 5 shots and C 3 
times in 4 shots, They бге a volley, What is the probability that 2 shots hit ? 

P 1М.А. Econ., Punjab, 1972) 
t 19/20] 
Qa) Three dice are thrown, What is the probability of at least ore of the 

poi ti up being greater than 4 ? 119/27] 
25: А тап is selected for interview for three seperate posts. At the first 

interview there are five candidates, at the se-ond four candidates and at the third 


six candidftes. 1f the chances of selection of the candidates are equally likely, 


what is the chance of the man obtaining at least one of the posts ? [1/2] 


„26б. Two groups eich of three children contain respectively two boys and 
one girl and one boy and two girls. One chilu is drawn at r ndom from each group. 
Calculate’ the probability that (a) both will be boys, (b) one boy and the other 
girl, apd (c) at most one Soy will be selected. [(a) 2/9, (b) 5/9, (с) 7,9] 

27. A speaks the truth in 80% of cases and B in 9075 of the cases. In wha 


PaL Of cases are they likely to contradict each other in stating the same 
act 


Two cards are drawa at random from the ordinary pack. 1f ith 
them isa king, or if both are kings, both cards are replaced ; Werke ites 
not repla.ed. Another card is then drawn at random. What is the probability 
that it 1s a king? {0 v8) 


QUESTIONS R33 


29. Suppose there are 4 houses, and 4 applicants. What is the probability 
(i) that all 4 applied for the same house, (ii) that each of the four applied for a 


different house ? —— 01169000) 3132] 
30." (6 uring war one ship in 10 was sunk on the average in making a 


certain vovage. What was the probability that at least 3 out of a convoy of 6 
ships would arrive safely ? 


(b) Three houses of the same type were advertised to be let in a locality. 
Three men made separate applications for a house. What is the probability : 


4i) that the all three made applications for the same house : 
(ii) that each of the three applied for a « iffcrent house ; and 
(iii) that two of them applied for the sam. house and the third for 


one о! the other houses ? 
[(a) 998730/1000000, (b) (2) M9, (i) 2/9 (iii) 2/3) 


ws it is known from experience that rain falls at a station on 12 days 

in every 30 days, find the probability that in a given week, four days will be wet 
and the remaining days dry. (M. Com., Delhi, 1965) 
р [3024/15625) 

32. A problem in statistics is given to three students 4, B and C whose 
chances of solving it are 3, 1 and} respectively. What is the probability that the 


problem will be solved ? 
(M.Sc., Agra, 1972 ; M.Com., Allahabad, 1967 ) 
[р==3/4] 
33, Three newsrarers A, B and C are published in a certain city. It is 
estimated from a survey of the adult population : 


20% read 4, 16% read B, 14% 1ead C, 8% read both A and B, 5% read 
both A and C, 4% read both Band C, 2% read all the three. What percentage 
reads at least one of the papers ? Of these that read at least one, what percent- 
age reads both 4 and B? [ (i) 35%, (10) 23%] 

@4) Two cubical dice whose faces are marked with digits 1 to 6are 
thrown simultaneously. Find the probability that the sum of the digits on the 
faces that turn up is 8. (M.A. Econ., De КЫ 


35. According to 1971 census, the sex ratio for the State of Assam is 
576 females per 1,000 males. If this tendency is expected to continue, what is the 
probability that à rewly born baby is male ? [1000/1,876 or 250/469) 


36. Out of 800 families with 4 children each what percentage would be 
expected to have (a) 2 boys and 2 girls, (b) at least от е boy, (c) no girls, and (d) 
at Most 2 girls. Assume equal probability for boys and girls 

[(a) 37 5%. (b) 93°75%, (с) 62596, (d) 68 75%] 


$4 and B decide to meet between 5 and 6 P.M. at Connaught Place but 
that eatshould wait no longer than 10 minutes for the other, Determine the 


probability that they meet. [11/36] 

38. (a) What is the probability of getting 9 cards of the same suit in one 

hand at a game of bridge ? Суха) 
52Саз 


(b) А,В and C in order toss a coin, The first one to throw a head wins. 


What are their respective chances to win 2 
(Mathematical Statistics Delhi. 1968) 
\ (A 4/7, B 2/7, О 1/7] 


КА In a group of equai number of men and women 10 per cent men and 

45 per cent women are unemployed. What is the probability that a person select- 
ed at random is employed ? (M. Com., Delhi, 1967) 
[0:725) 


STATISTICAL METHODS 


LM 
f Three groups of chilaren contain respectively 3 girls and 1 boy, 
2 girls and 2 boys, 1 girl and 3 boys. One child is selected at random from each 


group. Show that the chance that the three selected consist of 1 girl ard 2 bofs 
Uic 


SOT: 


prd A's chance of hitting а target is P B's chance ist and C's 
If they fire together, find the chance of only one shot hitting the 
target. [13/30] 


42. A number n is chosen at random from the integers 1, 2, 3.......... 2 A 
and A and B denote the п events that is a multiple of 2 and 3 respectively, Show 


that А and В are independent events when n=96 but not when п —100. 
(М.А. Econ., Delhi, 1967) 


chance is i 


43. Ina single throw with two dice, what is the probabilily of throwing 
(a) two aces, and (5) 7 ? (M.A., Econ., Delhi, 1968) 
[(a) 1/36, (b) 1/6] 


44. A and B throw with one die for a prize of Rs. 11, which is to be won 
by the player who first throws 6. If A has the first throw, what are theic respec- 
tive expectations ? (M.Sc., Agra, 1968) 

6 5 
( A's expectation — x 11 or Кз. 6, B's expectation=- y X 11= Rs. 5) 


т urns contain respectively 6 white and 4 black balls, 8 black and 
2 white balls, 10 white and 10 black balls. One black ball is drawn. Find the 
probability/that it came from the second urn. (8/17) 

Five men in a company of 20 are graduates. If 3 теп are picked out 


of 20 at random, what is the probability that they are all graduates ? What is the 
probability of at least one graduate ? (.C.W. A., 1965) 
[(а)[1/114, (Б) 137/228] 


47. A bag contains 5 white and 4 black balls. A ballis drawn from this 
bag and replaced, and then a second draw of a ball is made. What is the probabi- 
lity that the two balls were drawn of different colours ? (M Com., Agra, 1968) 

[40/81] 
à . А bag contains 5 red and 3 black balls and a second one 4 red and 5 
black balls. One of these is chosen at random and a draw of two balls is made 


from it, What is the probability that one of these is red and the other is black ? 
(М.А. Econ., Punjab, 1972) 
[275] 5041 


fa) What is the probability of getting only one head in five tosses of a 


+ 


coin ? 


(b) The probability that a boy will get a scholarship is 0 90 and that a 

Í 1 ) i] girl 
p d 0'8*. What is the probability that at least one of them will get the 
scholarship ? (M. A. Econ., Punjab, 1966) 
m ((a) 5/32, (b) € 98j 
50. Ifa man purchases a raffle ticket, he can win a prize of Rs. 5,000 or 
а second prize of Rs. 2,000 with probabilities 0'001 and 0003 respectively. What 
should b: a fair price to pay for the ticket ? (B. Com., Bombay, 1967} 
5 [Ry. 11] 

51. The odds are 7 to 5 against А, a person who is now 30 ivi 

: " > years old livin 

till he is 70 years and the odds are 2 to 3 in favour of B who is now 40 years of 
age living till he is 80 years. Find the chance that one at least of these two persons 
will be alive 40 years hence. (B. Com., Bombav. 1966) 
А 113/20] 
ix An s contains 4 red and 3 blue balls, Two drawings of 2 balls are 
made. Find the chance that the first drawing gives 2 red balls, and the second 


E 


QUESTIONS R35 


drawing gives 2 blue balls, if the balls are returned to the urn after the first draw. 
(M. Com , Gorakhpur saa} 
2]4 


53^ A bag contains 5 white and 3 black balls and 4 successive draws of 


one ball each are made ; and the balls ore not rep 
they are alternatively of different colour ? 


t 
54! (a) Explain, giving examples, the terms : 
(i) Equally likely events, 
di) Mutually exclusive events, 
(iii) Mutually exhaustive events, 
(iv) Independent events. 
(b) Tickets numbered from 1 to 100 are well shuffled and a ticket is drawn. 
What is the probability that the diawn ticket bas : 
(i) an odd number, 
(ii) a number 5 or multiple of 5, and 
(iii) a number which is a square ? 


(c) The odds in favour of A winning a game of chess against Bare 5 aie 


If three games are to be played what are the odds in favour of A's winning at 
least one game ? (M. Com., Bombay, 1970) 


г) (i) 1/2, GD 1/5, (iii) 1/10 ; (с) 335 Н] 
55. (a. Suppose two six-faced dice are thrown 10 times, What is the 

probability of getting double six in at least one of the throws ? 
Com., Deli, 1970) 


Ud 5 M0 
á TUS h _35 
vA [Hint : p-{i-( x) H 
toy There are three urns. Urn T contains 3 red and 7 green balls, urn. 1t 


has 5 red and 3 green bal's and urn 11 contains 8 теа and 4 green balls One 
rèd ball is drawn fiom one of the urns. What is the chance that it came from 


(i) urn I, and ui) urn IL ? 
Hint : Bayes’ Theorem 
(EVE) 
2 ©- €) 191 


56/ Ап urn contains 10 white and 3 red bills, Another urb contains 
3 white and 5 red balls. Two balls are transferred from the first urn and placed 
in the second, and then one ball is taken from the latter. What is the probability 
that it is a white ball ? 159/130] (M. Com., Delhi, 1971) 


57 (a) Enunciate the ‘addition’ and ‘multiplication’ thecrems of 
probabtlity. 

(b) A pitcher is taken to the well everyday for four years. If the odds be 
1000: against its being broken on any particular day, show that the chance of 
its ultimately surviving 15 more than}. 

(log 7="8451, log 11—1:0414, log 13=1'1139 antilog *4156—2:604) 

(M.A., Есоп., Delhi, 1970) 
58. (a) State and explain the Bayes' Theorem. 

(b) А coin is tossed. If it turns up heads, two bals will be drawa from 
urn A, otherwise two alls will be drawn from urn B. Ога A contains thre 
biack aod tive white balls. Urn B contains seveu black and. one white balls, In 
both cases, selections are to be made with replacement. What is the probability 
that urn А is used given «hat both the balls drawn are black ? 

м 1144]1619] (М. Com., Delhi, 1972) 
59. A factory produces а mechanism which consists of three independently 
manufactured paris, It is known that 1 per cent of part one, 4 per cent of pait 


R-3.6 STATISTICAL METHOSD 


1wo and 2 per cent of part three produced are defective, What is the probability 


that a complete mechanism is not defective ? (M B.A. MA AD 


60. Р speaks the truth in 70 per cent of Es Q n um cent of сезаз 
In wha! percentage of cases are they likely to contradict each other in stating the 
Mie pia УЕ (M. Com., Allahabad, sent 
(36% 


61. Two cards are randomly drawn from a deck of 52 cards and thrown 
away. What is the probability of drawing an ace in a single draw from the re- 


a ds: ? 
Iro Judas (M. Com., Meerut, 1970) 
[1/13) 


62. (i) In a random sample of 1984 wheat fields 1124 were irrigated, 568 
grew wheat mixed with barley and 204 were both jrrivated and mixed with barley. 
What is the probability that a field selected at random will be both unirrigated 
and pure ? {1/4} 


(ii) A factory finds that on the average 20% of the bolts produced by 
a given machine will be defective for certain specified requirement. It 10 bolts are 
selected at random from the day’s production of this machine, find ihe probability 
that (i exactly 2 will be defective, (i) 2 or more will te defective, (iii) more than 
5 will be defective. [G) 07302 (ii) 0 6242 (iii) 0:00037] 


63. The probability that a man will be alive for 25 years is 3/5, and the 
probability that his wife will be alive for 25 year is $. Find the probability that 
(a) both will be alive (5) only the man will be alive, (c) only the wife will be alive, 
(d) at least one will be alive, [(«) 2/5, (b) 1/5, (e) 4/15, (d) 13/15] 

64. A bag contains 8 balls, identical except for colour of which 5are red 
and 3 white, A man draws two balls at random What is the probabiiity that : 


(i) One of the balls drawn is white and the other red. 


(ii) Both are red. 
(iii) Both are white. 

[00 15/28, (ii) 5/14, (iii) 3/28] 
65. A bag contains 6 white and 9 black balls. Two drawings of 4 bulls are 
made such that (a) the balls are not replaced before the second trial. (6 the 
balls are replaced before the second tial Find the probability that the first draw- 

ing will give 4 white and the second 4 black balls in each case. 
(M Com Allahabad, 1968) 
- ((а) 3/650, (b) 6/5915] 
66. Ram and Shyam play fora prize of Rs. 810 Ram is to throw a die 
firstand is to win if hetbrowsó — If he fails, Shyam is to throw and isto win if 
he throws 6 or 5. If he tails, Ram is to throw again and is to win if he throws 6 
or$or4andsoon. Find out their respective expectations. j 


(M Com., Allahabad 1969) 
(Ram=422'5 Shyam= 3873) 
р 67. Ifa single draw is made from a pack of 52 cards, what is th E 
lity of securing either an ace of spades or jack of clubs ? dida e D 
(M. Com , Allahabad, 1970) 
A (1/26) 

‚ in an experiment with an insecticide for flies, the Black Cloud Е) 
Du Company finds tnat 80 per cent are killed on the initial application but Vit 
those which survive develop а resistance. The percentage of the survivors killed 
in further applications 1з one-half as large as the percentage killed on the imme- 
diately preceding application. What is the probability that a fly will survive four 
applications given that it survived the first two. (00864) 


,.,99. If it rains, an umbrella salesman can earn Rs, 300 per day. Ifit is 
fair he can lose Rs. 60 per day, What is his expectation if the Probability of rain 
103? (Rs. 48) 


QUESTIONS К-3:7 


70. (а) Two cards are randomly drawn from a ceck of 52 cards and thrown 
away. What is the probability of drawing ап aceina single draw from the 
remaining 50 cards ? 

(b) A bag contains 8 balis, idential except for colour, of which 5 are 


rd and 3 white. A man draws two balls at random. What is the probability 
that 
(i) One of the balls drawn is white and the other red 7 
.(ii) both are of the same cclour? What would be the values cf these 
probabi'ities if a ball is drawn, replaced and then another ball is drawn ? 
[(a) 1/13 ; (Б) (i) 15/32, (ii) 17/32] (M. Com. Meerut, 1971) 
71. A purs: contains 2 silver coins and 4 copper coins and a second purse 
contains 4 silver coins and 3 copper coins. Ifa coin is selected at random from 
one of the two purses, what is the probar ility that it is silver coin ? (19/42) 
72. Ifa man purchases a raffle ticket he can win a first. prize of Rs. *C00 
ога second prize of Re. 200? with probabilities 0 001 and 070003. Wheat should 
be a fair price to pay for the ticket ? (Rs, 11) 
73. A and B decide to meet between 5 and 6 p.m. but that each should 
wait no longer than 10 minutes for the other. Determine the probability that they 


meet. (M. Com., Nagpur. 1972) 
(11/36) 


74. There are two bags, one of which ccntains 5 red and 7 white balls 
and the other 3 red and 12 white balls. A ball is drawn from one or the other of 
the two hags. Find the chance that it is (i) red (її) white. 

[G) 37/120 Gi) 83/120] (M. Com., Meerut, 1973) 
75. (a) Two balls are drawn from a bag containing 8 red and 7 white 
balls. Find the chance that : 


(i) they are both red, 
(ii) they are both white, or 
(iii) one is red and the other white. 


(b) From the 30 tickets marked with the first 30 natural numbers, 
one is drawn at random; find the chance that the number on it is a multiple of 
3 or 5. 
(c) 4 and B toss an ordinary dice alternatively in succession. The 
winner is one who throws an ace first. If A is the first to throw, calculate their 
probabilities of winning the game. (M. Com., Meeruth, 1975) 

Eca) (0) 4/15, Gi) 1/5, (їй) 4/15 : (5) 7/15, (с) 6/11, 5/11] 

76. (a) State and explain Baye's theorem. 


(b) A factory produces certain types of outpu: by thre> machines. The 
respective daily production figures are: Machine A 3,000 units; Machine B 
2,500 units ; and Machine C 4,500 units, Past experience shows that 1 per cent 
ofthe output produced by Machine A is defective. Tne corresponding fraction 
of defectives for the other two machines are, respectively, 1 2and 2 per cent. An 
item is drawn at random from the day's production run and is found to be 
defective. What is the probability that it comes from the outputs of (i) Machine 


A? (ii) Machine С? “ а (M. Com., Delhi, 1975) 
(c) Two dice are tossed. What is the probability that the total is 
divisible by 3 or 4? (M. Com., Delhi, 1975) 


IG) G) 1/5, Gi) 3/5, (c) 5/9 
„SECTION 2 VS ELM 
THEORETICAL DISTRIBUTIONS 
1. (a) Expiain what is meant by the binomial distribution and obtain the 
mean and standard deviation of such a distribution, 


(b) State the conditions under which the binomial probability model is 
appropriate. (M. Com , Delhi, 1975) 


R-3'8 STATISTICAL METHODS 


2. Show how the binomial distribution arises. Find the mean and variance 
of this distribution. 
3. Define the binomial distnbution N(p+g)" and show that its mean is 


np and standard deviation V npg. (M. Com., Delhi, 1969) 
4 Write a note on the binomial distribution describing the situations in 
Which it ariscs and its chief characteristics. (м. я. Econ., Delhi, 19681 


5 (4) Show that when p is small and n is large, the binomial distribution 
P(z)="Czq"-*p2 tends to Poisson distribution. 


(i) Show that the mean and variance of the Poisson distribution are equal. 
(M.A. Econ., 1966) 


,. 6. Give the characteristics of the binomial and Poisson distributions. 
Give examples to illustrate your answer. (M.A. Econ., Delhi, 1969) 


7. Explain the distinctive features of Binomial, Normal and Poisson 
probability distributions. When does a Binomial distribution tend to become 
(i) a Normal and (ii) a Poisson distribution ? Explain clearly. 

(М.А. Econ., Delhi, 1970 ; 
M. Com., Delhi, 1971) 

8. (a) Explain the general characteristics of a Poisson distribution, Give 
three examples familiar to you, the distribution of which will conform to the 
Poisson form. (M.A. Econ., Punjab, 1965) 

(b) How do the Binomial and the Poisson probability distributions differ ? 
(M. Com., Delhi, 1965) 


(c) Describe the main characteristics of the normal distribution. 
9. (a) Wbat is a Binomial Distribution ? (M. Com., Delhi, 1971) 


(b) In what sense does the binomial variate x tends to normality as п 
becomes very large ? 


10. (a) Discuss the importance of normal distribution in statistical 
theory, (M.B.A,, T.U. 1976) 
(b) What are the chief features of Poisson and Normal Distributions ? 

Under what conditions does Binomial vend to Normal distribution ? 
(M.B.A., Delhi, 1971) 


att. Explain the procedure for fitting a normal curve to а given frequency 
distribution and discuss how after the curve has been fitted, you will proceed тоа 
test of goodness of fit, (M. Com., Delhi, 1970) 


12. (a) How does a normal distribution differ from a binomial distri- 
bution ? What are the important properties of a normal distribution ? How are 


they useful in random sampling investigations? (M. Com , Agra, 1972) 
(b) Why does the normal distribution hold the most honourable position 
in probability theory ? (M. Com., Delhi, 1975) 
13. Draw the curve 
М  —x*[e? 
J7— ——e 
сут 


and describe its chief properties. 

14. (a) Write a critica! note on the role of normal distribution in Statistics. 
(M.A., Econ.| Delhi, 1968 ; M. Com., Dethi, 1970) 

(b) State the important properties of the normal distribution, 
(B. Com., Bombay, 1970) 
When does a binomial distribution tend to become a Poisson distribution ? 
Under what circumstances would you арріу a Poisson ditribution in place of 
binomial distribution ? (M. Com., Delhi, 1970) 


QUESTIONS ў R-39 


15. (a) Find the mean and variance of Poisson distribution. - 
(М.А. Econ., Delhi, 1971) 


. (b) What are the conditions under which a binomial distribution would 
arise ? Clearly discuss this distribution and also derive its mean and variance. 


... Under what conditions would a binomial distribution tend to (i) Normal, 
(ii) Poisson. (M.A. Econ , Delhi, 1970) 


(c) Calculate the ordinates of the binomial 
128(5 + 4) (М. Com., Delhi, 1967) 
14, 20, 40, 20, 4] 
16. The probability of a bomb hitting a target is 1/5. Two bombs are 
enough to destroy a. bridge. If six bombs are aimed at the bridge, find the pro- 
bability that the bridge is destroyed. (M.A. Econ., Delhi, 913) 
10:345] 
17. If the probability of a defective bolt is 0:2, find (a) the mean and 
standard deviation for the distribution of defective bolts in a total of 1,000, and 
(b; find the moment coefficient of skewness and kurtosis of the distribution, 
LX —200, o=3'557, 810700225, 42=3'00025) ` 
18. Five coins are tossed 96 times. 
(a) Construct the theosetical frequency table for 0, 1, .. 
(b) Draw the conesponding histogram, 
(c) Find the expected number of times of g*ttirg at least 3 heads. 
[(a) 3, 15, 30, 30, 15, 3; (c) 48] 
19. Inaseed validity test 450 seeds are placed on filter paper in rows 
of §, The number of seeds that germinated in each row were counted and the 
results are shown below : 


-5 heads 


Number of Seeds Observed Frequencies 
Germinating per Row of Rows 

0 0 
1 1 
2 M 
3 30 
4 38 
5 10 

90 


If the germinating seeds are distributed at random among the rows, we 
could expect a bino nial distribution with n=5. Find an estimate of (a) the 
average number of seeds germinating per row, (b) the probability of a single seed 
germinating, and (c, using tne probability optained in part (b), compute the 
expected frequencies of rows for each number of germinating seeds and compare 


with the observed frequencies 
E ((a) 3*5 ; (b) 7/10 : (c) 022, 2755, 11°91, 27°78, 32:41, 1613] 
20. The mean of a binomial distribution is 20, and the standard dev.ation 
s4. Calculate n, p and 4. n=100, p= i а= i. 
5 
21. Ifhensofa certain breed lay eggs оп four days a week on the average 
find how many days during a season of 200 days a pouitry-keeper with eight hens 
cof this breed will expect to receive at least six eggs. [521 


22. Out of 800 families with 5 children each, how many would you expect 
to have (a) 3 boys, tb) 5 girls, aad (c) either 20r 3 boys? Assume ejual proba- 
bilities for boys and girls. (a, 250, (5) 25, (c) 500] 


23. A manufacturer fiads that one article in every twenty is below the 
required srandard. How many sub-standard articles would te expect to find in a 


sample of 2)0? {7 to 13] 


24.  Sixdice are thrown 729 times. How many times do you expect at least 
three dice to show a five or six ? (729 (3--3)9 or 233) 


R-3:10 * | STATISTICAL METHODS 


25. An electric bulb manfacturer finds that 4 percent of the bulbs are 
defective. How many perfect bulb: ate то be expected ina sample of 40^ ? What 
are the chances that a random sample of 20 will noi contain more than one 
defective bulb ? (380 to 388 ; 0:8942 


26. During a cold epidemic in a factory the chance of the workers catching 
acold is 10 percent. Whatis the probability tnat out of 5 workers 3 or more 
will catch cold ? (0:C0856) 


` 27. Five dice are thrown together 96 times The number of times, 4 5 or 6 
was actually thrown in the experiment is given below. Calculate the expected 
frequencies. 


No. of dice showing 4, 5 or 6 0 5 5А 2 Ei Ў 
КЕ 3 
“Observed frequency I3, 15, 30, 30, 15, 3j 
28. Take !00 sets of 10 tosses of an адыш. (perfet Бов, un how many 
5 1 t 7 heads and 3 tails, and (b) 7 h.ads at least ? 
cases d» you exect to get (a) » (M.A. Feon., Delhi, 1969) 
(a) 12, (b) 17] 
29. (a) What are the conditions under which the binomial distribution is 
expected ? 
1b) One hundred and ninety-two families (for each of which the possibility 
of an albino child being born is otherwise established) had the following distribu- 
tion of albinos among the first three children. 


No. of children 0 1 2 3 Total 
No. of families 77 90 20 6 192 


Find the expected frequencies on the basis of a theoretical probability that 
0 25 of a child being born an albino and test the goodness of fit ? 
(I. A.S., 1968) 
Expected frequencies 
[5 of children 0 1 2 3 ] 
No. of families 81 81 27 3 


30. А distributor of bean seeds determines from extensive tests that 5 
per cent of a large batch of seeds will not germinate. He sells the seeds in packets 
of 200 and guarantees 90 per cent minimum germination Determine the pro- 


bability that a particular packet wil! violate the guarantee. 10-0016] (I A.S. 1965) 
31. The distributio of number of suicides of women is given below : 

No. of suicides 0 1 2 3 4 5 6 7 

Frequencies 364 376 218 89 33 13 2 1 


" Fit a Poisson distribution and test its goodness cf fit. 
„133711, 397:9, 2316, 92:3, 272, 64, 1:3 and 02) (В. Com., Gujarat University, 1967) 
70... 32. A systematic sample of 100 Pages was taken from the Concise Oxford 


Dictionary and the ob,erved frequency distribution of foreign words per page was 
found to be as follows : 


No. of foreign words per page 0 1 2 3 4 S 6 
Frequency Оа 1 1 


Calculate the data by a Poisson distribution, and judge the goodn 


your graduation. о! 


(ГА S. 1970) 


0 1 2 3 4 5 6 
3714 36.77 182 60 Y5 030 05 
33, The table gives 


the number. of mistak i i i 
a manuscript of 584 pages ; e committed per page in typing 


Mistakes per page OG ing 2 3 
Number of pages 238 108 97 30 


on 
cou 
na 
cox 


AEA 


QUESTIONS R-X11 


Fit a Poisson distribution to the data given гбоуе and test the goodness of 
fit. Present the results in a tabular from. 


0 1 2 3 4 5 6 7 
( 231'56 2142 9907 3054 706 130 020 030) 


34. (а) Ina Poisson distribution, the probability P(x) fer х=0 is 10 per 
cent. Find the mean of the distribution. 12-3026] (М. Com., Delhi, 1967) 
e followine number of mistakes per page in typing 


(b) A typist commits th 
‘bution and calculate the theoretical frequencies : 


100 pages. Fit a Poisson distri 
Mistake per page 0 1 2 3 4 5 
No. of pages 42 33 14 6 4 1 

(368, 36:8, 184, 611, PS, 03) 


fer iu ar v electric uno iak by A pany are defective, 
n e probability that in a sample o! ulbs (a) 0, 1, 2, (d) 3, (e) 4. 
(f) 5 bulbs will be defective. Mou uui ud Sd d 
(а) "05, (b) "149, (c) "224, (d) +224, (е) `168 and (f) 1017 
36. Fita Poisson distribution to the following data : 


No. of deaths recorded 

in a day 0 1 2 3 4 5 бї ут 
No} of days 364 376 218 89 33 13:009. 1 
x (M.Sc., Agra. 1968) 


с 0 1 ? 3 4 5 6 7 
(e 3367 3973-2344 922 272 44 12 `2 
ntities and about 20% of tbem are 


37. Articles are produced in large qua 
ual numbers. How large should 


They are dispatched in batches of eq 


defective. ] u 
abatch be to ensure that not more than lin 5contain môre than 2 defective 
articles ? M.Sc., Agra, e 


38. A bombing technique secures 1 out of 10 hits in the 'arget area. Use 
hould be launched in order 10 


the Poisson curve to determine how many bombs sl } 
have 90% chance of securing at least 8 hits. (М.А. Econ., Delhi, Win 


39, Ifthe average number of rejects in the manufacture of a certain article 
1, 2, 3, 4 rejects in a sample of 10 articles 


is 4%, what are the probabilities of 0, 
taken at random ? (067, 0°25, 0-05, 0:01, 00007] ` 
40. Find the ordinates of the normal curve 


0-68, b) z=— 48, (c) z-- 142. 
(а) 2: b)z 48, (c) z (а) “0110, (b) :3555, (c) 1456] 


4| Find the area under the normal curve between 
(a) z=—1'5 and z=26 
(b) z=1°47 and 2—220 
(c) z=—2:25 and z— 40. 
42. Find the area under the normal curve 
the left of z= 48, (c) to the right of z——132. 
to the left of z——2 12 and to the right of z—1 46. 
[(а) 09495 , (b) “0844, 1с) 9066, (d) "0170, (e) "0891] 
. 43. A normal distribution has mean y=12 and standard deviation с=2. 
Find the following areas under the curve : 
(а) from x=10 to x=13'S 
(b) from x=11-4 to x= 142 
(c) from x=9'6 to x—13:8 
(d) fromx=6 to х=18. 


[(а) `9285, (Б) `0569, (с) 3324] 


as (а) to the left of 1°64, (b) to 
(d) corresponding to x22 124 (e) 


{(а) *6147, (b) 974822 , (с) "7008, (d) 9973) 


at 


R-3:12 STATISTICAL METHODS 


44. How would you use a Poisson distribution to find approximately the 
frequency of exactly 5 successes in 100 trials the probability of success in each trial 
being р=0-1, (3781 (M, Com., Delhi, 1969) 

45, Ifthe heights of 300 students are normally distributed with mean 68 
inches and standard deviation 3 inches, how many students have heights : 


(a) greater than 72 inches, 

(b) less than or equal to 64 inches, 

(с) between 65 and 71 inches inclusive and 
(d) equal to 68 inches. 


Assume the measurements to be recorded to the nearest inch. 
. [(a) 20, (6) 36, (c) 227, (d) 40] 
46. If the heights of 10,000 college men closely follow а normal distribu- 
tion with a mean of 69`0 inches and a standard deviation of 2:5 inches. 
me (а) how many of these men would you expect to beat least 6 feet in 
eight ; 


(b) what range of heights would you expect to include the middle 75% of 
the men in this group ? [ta) 1151, (b) 66:125 to 71:875 


47. Ina restaurant on a particular morning thé amounts spent for break- 
fast by all patrons follow a normal distribution with mean 87 2 and o=12'0 paise. 
On that morning if 420 people spent 85 paise or more for bicakíast, what is the 
number of people served ? [735] 

48. Ina newspaper article it is stated that the height of 90 per cent of the 
male population lies between 5 ft. 1} inches and 6 ft. }} inches. Assuming that 
the distribution of heightis normal and that the two limits are placed symme- 
trically with respect to the mean, what values can bs deduced for the mean and 
standard deviation of the height ? { X= 67°625, 0=3:57] 


49. Mean and standard deviation of chest measurements of 1,200 soldiers 
are 85 cms, and 5 cms. respectively, How many of them are expected to have their 
‘chest measurements exceeding 95 cms. assuming the measurements to follow the 
normal pattern ? (M. Com., Delhi, 1988) 

{27 

50. The following frequency table was made from the record of accidents 

in a year in 98 textile mills of Bombay : 


No. of accidents ina year 0 D 2 3 4 5 or more 
No. of Mills 24 38 22 11 3 0 
Fit a Poisson distribution 10 the data, (B. Com., Bombay, 1966) 
(x 0 1 2 3 -4  S5ormore) 
lf, 2683419 227 98 32 8 ] 


SI, Use the theoretical model! which you cons der to be most appropriate 
for the following cata : 


Find the expected frequedcies and test for goodness of fit. 
Number of ignitions in the mauuiacturc of an iplosive per cay. 


No. of ignitions (k) 0 1 2 3 4 5 6 
No. of days with kignitions 75 + <0 54 2 6 2 1 
(B.A.|B Sc., Sardar Patel Univ., 1969) 
( Hint : Fit Posissun d stribution 1 


(X 0 1 2 3 4 5 6 | 
Uf, 742. 902 549 272 68 r6 03] 
52. Assuming a normal distribution with N= 1000, u= 80 and с=15 
(a) How many observations may be expected to lie between 65 aad 110? 
b) Find the value of the variate beyond which 1095 of the observations 
would lie. 1B. Com., Bon.bay, 1967) 
Ia) 818, (b) 992] 


————— 


| 


| 


QUESTIONS R-3:13 
53. In an intelligence test administered to 1,000 children the average score 
is 42 and the standard deviation 24. 
(i) Find the number of children exceeding score 60 and 
(ii) Find the number of children with score lying between 20 and 40. 


Assume ihe normal distribution. {@ 227, (ii) 2891 (B. Com., Bombay, 1966) 


54. The distribution of monthly incomes ofa group of 3 000 university 
teachers conforms to а normal curve with the mean equal to Rs, 600 and the 
standard deviation equal to Rs. 100. Find 

(i) the percentage of teachers having a monthly income of more than 
Rs. 800; 
(ii) the number of teachers having a monthly income of less than Rs. 400; 
(iii) the highest monihly income amonst the lowest paid 100 teachers ; and 
(iv) the least moathly income among the highest paid 100 teachers. 

—2 

(Fora normal variate /= x „the area under the curve between t—0 
and (—2 is 04772 and that between 1—0 and £=1°83 is 04667). 
(0) 2:2896, (ii) 68, (iii) Rs. 417, (iv) Rs. 783] (B. Com., Bombay, 1967) 

55. Thetable below gives the heights in cm, of 1,000 college students, 
Fit a normal curve to this distribution : 


Height X Height ГА 
155—157 4 179—181 125 
158—160 8 182—154 92 
161—163 26 185—187 60 
164—166 3 188 - 190 22 
167—169 89 191—193 4 
170—172 146 194—196 1 
173—175 188 197—199 1 
176—178 181 


(B. Sc., Sardar Patel Univ., 1968) 
55. Ina certain factory turning outrazor blades there is a small chance 


1 
500 for any blade to be defective, The blades are in packets of 10. Use Poisson 


distribution to calculate the approximate number of packets containing no defec- 

tive, one defective and two defective blades respectively in a consignment of 10,000 

packets. (M. Com), Rajasthan, 1965) 
(N( Po) 29802, М (P,)—196, N(P2)=2} 


57. (a) If the distribution of incomes of a group of persons be assumed 
to be normal with mean of Rs] 500 and the standard deviation Rs. 50. estimate the 
proportion of individuals with incomes (а) between Rs, 550 and Rs. 650 (b) 
betwcen Rs. 450 41d Rs, 475. [(a) 1577496, (b) 14-27%,) 


NO The scores made by candidates in a certain test are normally distribu- 
ted with mean 500 and standard deviation 100] What рег cen: of candidates 
receive scores (i) less than 400, (ii) between 400 and 600 ? (Fora standard normai 


а. XE 
distribution z= =~ the area under the curve between z=0 and z—1 is 034131. 


(B. Com., Bombay, 1970). 

(01) 15°87%, (ii) 63-2600) 

58. Thelocal authorities in a certain city instal 10,000 electric lamps in. 

the streets of the city. If these lamps have an average life of 1,000 burning hours, 
witha standard deviation of 200 hours, what number of lamps might be expected 
to fail (i) in the first $00 hours, (ij) between 800 and 1,200 burning hours? After 
what period of burning hours would you expect that (i) 10% of lamps would 

fail ? (17) 10% of the lamps would be still burning ? 

(B. Com., Bombay, 1967) 

Xs (e 1,587 (ii) 6,826 ў 
w^ LG) 744 and Ci) 1,256) 


R-3:14 STATISTICAL METHODS 
59. Ata certain examination, 10% of the students who appeared for the 

paper in statistics got less than 30 marks and 97% of the students gor less than 
62 marks, Assuming the distribution to be normal, find the mean and the standard 
deviation of the distribution, given that 40% of the area of the normal curve is 
between the ordinate corresponding to z=0 and z=1'3 and 7% of the area is 
rh x-x, (B. Com., Bombay, 1968) 

between z— 103 and z—1:9 where z- ^; LY =43 04, 01003] 


60. (a) The weights of 4,000 students are found to be normally distributed 
with mean 50 kilograms and standard deviation 5 kilograms. Find the number of 
students with weights— 

(i) Less than 45 kgs, 
(ii) Between 45 and 60 kilograms, CC) 635, (ii) 3274] 


(b) Suppose that a doorway being constructed is to be used byaclass of 
people whose heights are normally distributed with mean 70” and standard devi- 
ation 3”. How much high the doorway should be without causing more than 25% 
of the people to bump their heads? Tf the height of the door may bé fixed at 76^, 
how тапу persons out of 5,000 are expected to bump their heads? ' 

(B. Com., Bombay, 1969) 
A ; 1727025 and 114 

61. A company estimates that approximately 30 per cent ol! the special 
priz coupons it plans to mail for a sales promotion programme will be returned 
If the probability is P=0°3 that а person receiving a coupon will return it, what 
is the probability that more than 165 coupoas will bz returned if 502 аге mailed ? 

[0:0788] 

62. If 20% of the bolts produced by a machine are defective, determine 
the probability that out of 4 bolts chosen at random (aj 1, (b) 0, (с) at most 
2 bolts will Ыг detecuve. (M. Com., Raj!, 1970) 

(а) 4096, (bj "4096, (c) 9728) 

63. Show that with an unbiased coin the chance of getting exactly 5 heads 
in 6 throws is 6/64, and of geiting at least 5 heads in 6 thiows ix 7,64. 

(M. Com., Meerut, 1969) 


,,9*. Eight coins are thrown simultaneously. Find the chance of 
obtaining 
(i) at least 6 heads 


'ii) no head 

(iii) all heads. (M. Com.) Meerut, 1970) 
[G) 37/256, (ii) 1/256, (iii) 1/256] 

65. (a) Describe the nature of the normal curve and its applications. 


А (b) A student secures 72 marks їп an examination їп Sociology for which 
his class average is 54 and the standard deviation 20. He secures 76 marks in 
statistics for which his class average is 52 and the standard deviation 12, 


What can we say about the performance of this student with reference to 
these two examinations. (Convert z scores or standard scores for both examina- 
tions and make your comments). (B.A. Bombay, 1970) 


! 66, Certain automatic screw manufacturing machine produces on the 
average one slotless screw among every 100 screws. If the screws are packed in 
boxes of 300, what percentage of these boxes would you expect to have (i) no 
slotless screw and (ii) at least one slotless screw ? (M. Com., Delhi, 1971) 

(02) 4 9896, (ii) 95:0296] 

67. In a certaip examination the percentages of pass’ s and distributions 

were 46 and 9 respectively. Estimate the average marks obtained by the candi- 

dates, the marks being 40 and 75 respectively. (Assume the distribution of marks 
to be normal . 


Also determine what would have been the minimum qualifying marks for 
admission to a re-examination of the failed candidates, had it beendesired that the 
best 25% of them should be given another opportunity of being ex:mined 

(M A. Econ., Delhi, 1970) 


— 


QUESTIONS R-315 
68. (a) A survey of male children in 128 families each having 5 children 
gave the following data : ; 


No. of male children 0 1 2 E» 4 5 Total 
No. of families 9 17 26 39 22 12 125 


Fit a Binomial distribution to the data. 


(b) For a binomial distribution, п=10 and p—0:35. Find из, Ыз and pa. 
(B. Com.,Bombay, 1970) 


69. (a) Articles of which O'l percent are defective are packed in boxes 
each containing 500 arsicles. Using Poisson approximation or otherwise, find the 
proportion of boxes that contain : 

(i) no defective (ii) two or more defectives. 

(Log e=0°4343) IC) 60°65%, (ii) 903%] (В.4., Bombay, 1969) 

70. (a) Enumerate the various properties of a normal distribution 


(b) The mean І.О. of a large number of children of age 14 was 100 and 
the standard deviation 16, Assuming that the distribution was normal find 


(i) what percentage of the children had 1.0. under 80 
(ii) between what limits the I Qs of the middle 40% of the children Lay : 
(iii) what % of the children had I Qs within the range ХУ 1:96c ? 


(c) The first and third quartiles o! a normal distribution are respectivel 
93and 128. Find the mean and the standard deviation. (В.А. PONES 1968) 


71. Suppose that the heights of all cakes baked with a certain mix have a 
mean of 5'3 ст, and a standard deviation of 075 cm. Assuming that the distribu- 
tion of the heights of these cau be approximated closely with a normal curve, find 


(i) the percentage of cakes which have a height of 4°4 cm. or less 


(ii) the percentage of cakes having heights from 5 cm. to 6'2 cm, 


(iii) 9 height below which we may find the flattest 20 per cent of the 
cakes 
(iv) the central 5096 of the distribution. 
[G) 11551, (i) 54°03, (iii) 4°67, (iv) 4:794 to 5:806] (B.A.,Bombay, 1971) 

72. Ей а Poisson distribution to the following data : 

x 0 1 2 3 4 Total 

f 109 65 22 3 1 200 

(e-9:81—0 5436) 

73. Five per cent of the electric bulbs manufactured by a company аге 
defective. Using normal approximation, find the probability that in а sample of 
400 bulbs 30 or more will be defective. [00113] (8.4.,Bombay, 1971) 

74. Calculate the frequencies of the normal distribution which has the 


same mean, standard deviation and total frequeacy as the distribution given below 
for the intervals 60—65, 65—70 etc. 


x  60— 65—  70— 75— .80—  85—  90—  95— 
$: 3 21 150 335 326 135 26 4 
B Sc. Sardar Patel, Univ., 1972 
( 31, 30:8, 148-0, 322 2, 3195, 144 0, 296. 28 


75. In turning out a certain component in a factory the average number of 
defectives is 1%. What is the probability of two or тоге defectives 1n а se imple 
of 100? 102642) 


SMRE—10'77-11 


R-316 STATISTICAL METHODS 


76. Assume that the marks in a Graduate Examination are normally distri- 
buted with 500 and с = 100. Of 674 studeats taking this examination, it is desir 
ed to pass 550 of them. What should be the lowest marks permitted for passing ? 

[412] (M. Com. Delhi,1973) 


77. Itis known from past experience that in a certain plant there are on 
the average 4 industrial accidents per month. Find the probability chat in a given 
month there will be lower than 4 accidents, Assume Poisson distribution, 

[0:805] (M. Сот. Delhi , 1973) 
78. (a) Inacertain examination 20 per cent of the candidates scored 60 or 
more marks and 30 percent scored 40 or less marks. Find the mean and stan- 
dard deviation of marks assuming that the marks are normally distributed. 
LY —47774, с=14:6] (M. Com., Delhi, 1975) 
(Б) A telephone exchange receives on an average 4 calls per minute, Find the 
probability on the basis of Poisson distribution (m =4), of 


(i) 20r less calls per minute, 
(ii) up to 4 calls per minute, and 


(iii) more than 4 calls per minute. 
(M. Com., Meerut , 1975) 


79. Ina certain examination 20 per cent of candidates scored 60 or more 
marks and 30 per cent scored 40 or less marks. Find the mean and standard 
deviation of marks, assuming that marks are normally distributed. 

(M. Com., Delhi, 1975) 


80. (a) A set of 5 coins was thrown 3,125 times and the number of heads 
appearing in each throw was recorded as in the following table. Estimate the 
probability of the appearance of head in a throw for each coin and calculate 
the theoretical frequency of each number of heads on the assumption that the 
binomial law holds : 

Мо, of heads 0 1 2 3 4 5 

Frequency 32 225 710 1120 820 218 


(M. Com., Meeruth, 1975) 


(b) Of a large group of men, 5 per cent arc under 60 inches in height 
and 40 per cent are between 60 and 65 inches. Assuming a normal distribution, 


find the mean height and standard deviation. (M. Com., Delhi, 1975) 
I(a) 32, 240, 720, 1,080, 810, 243, (b) X —6543, o=3'29] 
SECTION 3 


TESTS OF SIGNIFICANCE 


1. Discuss the main principles of large sample theory with a special 
reference to sampling of attributes. 


.,2. Why should there be different formulae for testing the significance of 
the difference between means, when the samples are (a) small and (5) large ? 
| 7 (M.A. Econ., Delhi, 1969) 
3} What is sampling distribution ? Explain the role of standard errors in 
large sample tests. (BA, Madras, 1966) 


4! How do you test the significance of the difference of twi i 
о v riances 
when the samples are (i) large, and (it) smali ? Mention the assumptions involved 


and the statistics used in each case, (М.А. Econ., Punjab, 1968) 
unns Define students *£' and write down without proof its sampling dis- 
М а (LA S., 1969) 


6. (i) Explain the terms (a) null hypothesis, and (5) the level of signi- 


ficance. 
(ii) Explain thé concept of null hypothesis. 


Еч 


QUESTIONS К-3:17 


7. Explain what do you understand by the statement that tne means of 

the two 1andom samples differ significantly from one another at 5% level. 
(M.A. Econ., Delhi, 1970) 
8. What do you understand by t-test and Fisher's F-test? Indicate some 
practical applications of ihese tests. (M A. Econ., Delhi, 1968) 


9. Describe the different uses of F statistic, stating clearly the assumption 
involved. 

10. Explain how the student's '? test is а landmark in the development 
of statistical methods. (M.A. Econ., Delhi, 1969) 

11. Explain the uses of /-test and F-test indicating in each case what 
exactly is sought to be tested and under what assumptions the test will be valid. 

12. Bring out clearly what is meant by the sampling distribution of a 


statistic. Develop your answer with reference to one statistic. (.A.S., 1969) 
13. Define the standard error of a statistic. Explain the basis of sample 
test of significance based on standard error. (1.А.$., 1972) 


14. Explain how the Student's 't? distribution may be used to 


(i) test the significance of the sample correlation coefficient in a sample 
drawn from a bivariate normal population. 
(ii) test the significance of the difference between the mean yield of 
two varieties in an agricultural experiment. 
(iii) explain Fisher's transformation of the correlation coefficient and 
indicate its use in tests of significance. 


Define student's 4°. Discuss briefly the different tests based on t. 
(B.A., Bombay, 1970) 


15. Discuss the importance of Student's *t* distribution in exact tests of 
significance. When are tests based on this distribution preferred to the older 
methods based on the normal distribution ? (M.A. Econ., Delhi, 1966) 

16. (a) What is meant by 'standard error' ? How are standard errors 
helpful in testing hypothesis and in decision-making ? (М.А. Gorakhpur, 1968) 

(b) What is standard error’ ? Discuss its role in large sample theory. 
(M. Com., Delhi, 1975) 
17. What do you understand by a test of significance ? Describe the steps 


involved in testing the significance of an observed correlation coefficient ? 
(B.Sc., Madras, 1970) 


18. Explain how Student's t-distribution is used to test the significance оѓ . 
the difference between the means of two samples stating clearly the underlying 
assumptions, (B.A.,Bombay, 1970) 

19. A coin is tossed 576 times and the number of heads observed is 280, 
Can the coin be regarded as unbiased? Also between what limits p, i.e., the 

‚ chance of getting a head at a single throw with this coin, almost certainly lie. 


Di. 
f Dicrence ="67, coin is unbiased | 


| , 
U *4£61-3 (002083) J 


20. 1,000 apples are taken from a large consignment and 100 are found 
to be bad, Estimate the percentage of bad apples in the consignment and assign 
the limits within which the percentage lies. 

(SE,-0095 _ : 1 
L 96 of bad apples in the consignment— 7:15—12:85 ) 

21. Arandom sample of 500 pineapplés was taken froma large con- 
signment and 65 were found to bs bad. Show that the S.E. of the proportion 
of bad ones іп a sample of this size is 0015 and deduce that the percentage of 
bad pineapples in the consignment almost certainly.lies b tween 8:5 and 17 5. 

(B. Com., Bombay, 1969) 
[S.E.5—'015 or 1°5 per сет) 


R-3:118 STATISTICAL METHODS 


22. At a certain date in a largecity 409 out ofa random sampie of 
500 men were found to be smokers. After the tax on tobacco had been heavily 
increased another random sample of 600 men in the same city included 400 
smokers. Was the observed decrease in the proportion of smokers significant ? 


(PERE s, The decrease is significant ) 


e in Rajasthan, 280 are found to be 


sampl from a villag 
CUOI FER a i t both the food articles 


wheat eaters and the rest rice eaters. Can we assume tha 
are equally popular ? 


і i 
[РЕ е. = nr. Since the difference is more than 2:58 1 
4 S.E. at 1% level, our hypothesis does not hold good. Food ji 
articles are not equally popular. 


24. The mean of a random sample of 100 individuals from a population is 
64:3", The standard deviation of the sample is 2/7". Would it be unreasonable 
to suppose that the mean of the population is 60". (B.A., Madras, 1966) 
Difference 1593, Hypothesis is doubtful.) 
25. А sample of 400 items is taken from a population whose standard 
deviation is 15. ‘The mean of the sample is 2:5. Test whether the sample has 
соте from a population with mean 26'8. Also calculate the 98% confidence 
limits of the population mean. 
n. Difference _ 18 i 
X-175, Difference _ ГА =2'4. Difference not significant. ] 
at 1% level 98% confidence limits 23:25—26:75. 
26. Inarandom sample of 500 members the mean is found to be 20. 
In another independent sample of 409 the mean is 15. Could the samples have 
been drawn from the same population with standard deviation 4? 


Difference 5 j Mcr 
USE a7 0 18:5. Highly significant. ] 


27. A railway company installed two sets of 50 Burma tieseach. The 
two sets were created with creosote by two different processes. After a number 
of years of service it was found that 22 ties of first set and 18 ties of the second 
“set were still in good condition. Are we justified in claiming that there is no 
teal difference between the preserving properties of the two proce:ses ? 


(1.A.S., 1967) 
Difference "08 
-5E = "ugs = 52. Not at all significant. 

28! Random samples of 200 bolts manufactured by machine 4 and 100 
holts manufactured by machine Bshowed 19 and 5 defective bolts respectively. 
Is there a significant difference between the performance of the two machines ? 

Diff./S.E.=1'29] (B.A., Bombay, 1970) 

29. A random sample of 400 members is found to have a mean of 4°45 
cm. Сап іє бе reasonably regarded asasample froma large population whose 
mean is 5 cm. and whose variance is 4 ? (B. Com., Bombay, 1969) 

Differencà ` 
30. (a) A coin is tossed 10,000 ti d Pd me ] 
. (a ssed 10, times aad th i 
Would you consider the coin biased ? $ елена аааз times 


(b) A random sample of 1,000 mill workers at Kanpur showed their mean 
wages to be Rs 47 per month witha standard deviation cf Rs 28. A sample of 
150 mill workers in Rombay showed their m:an wages to be Rs. 49 per month 
with a standard deviation о! Rs. 40. On the basis of the data would you :ay that 
tbe mean wages of mill workers in Bombay are higher than those at Kanpur ? 

(М.А. Econ., Lucknow. 1967) 
Difference 


| (а) S.E. =3'9 «| 
Difference |. 
(5) eg cud 591No. 


QUESTIONS R-319 ` 


31. 500 articles were selected at random out of a batch containing 10,000 
articles and 30 were found to Ыз defective. How many defective articles would 
you reasonably expect to find in the whole batch ? (M. Com., Calcutta, 1969) 

[Between 282 and 918] 

32. Outofa consignment of 1,(0.000 tennis balls 490 were selected at 
random and examined and it was found that 20 of these were defective. How 
many defective balls can you reasonably expect to have іп the whole consignment 
at 95% contidence level ? 12,340 to 5,920] - (B. Com., Bombay, 1967) 


33: The mean height of a group of 1,594 men belonging to a certain caste 
was found to be 67:34 inches and the standard deviation was 2/73 inches. The 
mean height of another group of 1,321 men belonging to a second caste was 64 22 
inches and the standard deviation was 2:69 inches. Can the two castes be consi- 
dered to be signiticantly different in height ? (М. Com., Calcutta 1967) 

Difference WEN f t 
—SE  —33 Castes are significantly different in height. ] 

34. A random sample of 1.000 farms in a certain year gives an average 
yield of weeat 2,000 Ib. per acre with a standard deviation of 192 16. А random 
sample of 1,000 farms in the following year gives an ave: age yield ot 2,100 Ib. per 
acre with a standard deviation of 202 Ib. Are these data consistent with hypothesis 


that the average yields in the country were the same in these two years ? 
(M. Com., Calcutta, 1969) 


i 100 х 
piterence <q =11'36. The average yields are not the same 


Shi 8'8 
35. A samp'e of 50 pieces of a certain type of string was tested. The mean 
breaking strength turned out to be 14:5 pounds. Test whether the sample is from 


a batch of strings having а mean breaking strength of 15:6 pounds and standard 
deviation of 2:2 pounds. (M. Com., Calctta, 1970) 


Diference =3'55, Hypothesis is not correct, i.e., the sample came from 
a different population. ) 


36. A random sample from 200 villages was taken from Kanpur district 
and the average population per village was found to be 420 with an S.D. of 50, 
Another random sample of 200 villages from the same district gave an average 
population of 480 per village with a S.D. of 60. Is the difference between the 


averages of the two samples statistically significant ? (M. Com., Raj., 1966) 
Diffe 
(PE о», hence significant. ) 


37. In a recent survey, two samples were drawn each containing 500 
hamlets. In the first sample the mean population per hamlet was found to be 100 
with a S.D of 20, while in the second sample mean population was 120 with 
S.D. 15. Do you find the average of the two samples to be statistically significant ? 

(M. Com., Raj., 1965) 


SG 7 17°85. Difference is significant ) 

38. An auto company decided to introduce a new six-cylinder car whose 
mean vas consumption is claimed to be lower than that of the existing auto-engine. 
A sample of 50 new cars was taken and tested for gas consumption on test-runs, 
It was found that mean gas-consumption for the 50 cars was 30 m.p.g. with a 
standard deviation of 3:5 m.p.g. Test for the company at 4% level of siznificance 
whether the claim, that the new car gas-consumption is 28 m.p.g. on the average, is 
acceptable, (M. Com., Calcutta, 1966) 


(тоте 


( Bus oe —4. Sample is much тоге economical in gas-consumptton than 
the population mentioned. ) 


R-320 STATISTICAL METHODS 


39. From a normal population, a random sample of size 32 was drawn and 
1he sample standard deviation was found to be 1°38. Using 1 рег cent level of 
significance, decide if it would be reasonable to adopt the value unity for the 
population standard deviation. (1.4.$., 1968) 

Difference 038 _ 3.94 ) 
ТАТЕ. 1, А123 
sisting of 400 and 500 persons have mean heights 68:5 


variances 6'4 and 6:0 respectively. Examine whether 
(B.A.,Madras, 1966) 


40. Two groups con: 
inches and 6571 inches and. vat 
the difference in fneans is significant. 


Difference =143. Difference is significant.) 


А1. A sample of 10 measurements. of the diameter ofasphere gave à 
mean ¥=438 inches and a standard deviation S—0'06 inches. Find (a) 9595 and 


b) 99% confidence limits for the actual diameter. 
T (x Limits 4:343 to Фат) 
99% Limits 4331 to 4429) 
42. The yields of two types —Types 17 and Types 51—of grams in pounds 


per acre at 6 replications are given below. What comments would you make on 
the difference in the mean yields? You may assume that if there be 5 degrees of 


dreedom and Р==0'2, itis approximately 1°476. 


Replication Yields in Pounds Yields in Pounds 
(Type 17) (Type 51) 
1 20°50 24°86 
2 24°60 2639 
3 2306 2819 
+ 29:98 30°75 
5 30°37 29:97 
! 6 23°83 2204 


(1.A.S., 1967) 
t=1'493. For у=5 10.02" 1:476. The results of the experiment do not 
support the hypothesis. There is difference in yields. 
43. The following table shows the results of an experiment with ten 
menu on the effects of two supposedly soporific drugs 4 and B in producing 
sleep : 
Additional hours of sleep gained by the use of two tested drugs * 
Patient 1 2 3 4 5 6 7 8 
Dprugg4 07 —r6 —02 —12 -—01 34 37 à у А 
DrggB 19 08 11 01 —01 44 55 $ ph 6 
Test the efficacy of these two drugs as soporifics on the assumption that different 


random samples of patients were used to test different drugs. 
(B. Com., Bombay, 1972) 


[153:78, у=9 19.913725. The two drngs differ materially] 


44, Ten specimen of i : 
reakirg strength (in kg. ШҮ. wires drawn froma large lot have the following 


578, 572, 570, 568, 572, 578, 570, 572, 596, 548: 


Test whether the mean breaking strength of the lot may be taken to be 578 kg. wt 
(B.A.,Bombay, 1970) 


i [t= 1:54 ›=9 fo.05=2'26 Yes] 
45, A farmer grows crops on two fields 4 and B. On A he puts Rs. 10: 


worth of manure per acre and on B Rs. 20 w 
ciusive of the cost of manure on the two dide the ve SEL iuc iei 


Year 1 2 3 4 5 
Field A, Rs. per acre 34 28 42 3 АА 
Field B, Rs. рег acre 36 33 43 38 50 


QUESTIONS R-321 


Other things being equal, discuss the question whether it is likely to pay the farmer 
to continue the more expansive dressing, The value of ¢ at 4 d.f. is 2 776, 
(M.A. Econ., Punjab, 1966): 


(t=3'814, v=4 to 952766 Yes) 


46. At an agricultural station it was decided to test the effect of a given 
fertilizer on rice production, To accomplish this 26 plots of land having equal 
areas were chosen ; half of these were treated with the fertilizer and the other half 
were tintreated (control group), Otherwise the conditions were the same, The 
mean yield of rice on the untreated plots was 5 4 bushels with a standard deviation 
of ‘50 bushels, while the mean yield on the treated plots was 5°7 bushels with a 
standard deviation of '40 bushels. Can we conclude that there is a significant 
difference in rice production because of the fertilizer ? 

(12 1°68, ¥=24, 15.0572 06) 

47. To compare the price of a certain commodity in two towns, ten shops 

were selected at random in each town. The following figures give the price found + 


Town 4 61 630756 163 56 63 59 56 4 61 
TownB 55 $4.47 39.^ "$81 61 57 54 64 58 


Test whether the average price can be said to be the same іп the two towns. 
(М.А... Econ., Delhi, 1971) 


( 1= 0:906, 0-18, Far v=18, 1705 2:10. Average price is the same in the ) 
two towns, 
48. The I.Q’s (intelligence quotients) of 16 students from one area ofa 
city showed a mean of 107 with а standard deviation of 10 while the I.Q's of 14 
students from another area of the city showed а mean of 112 with a standard 
deviation of 8. Is therea significant difference between the LQ's of thetwo 
groups at (a) ‘01, and (b) *05 level of significance ? 
11:45, v=28, 19.057205, 19.912:16, There is no significant ) 
( difference between the intelligence quotients of the two groups, 
49, The mean lifetime of electric bulbs produced by a company has in 


the past been 1,120 hours with a standard deviation of 125 hours, A sample of 100 

electric light bulbs recently chosen from a o of newly produced bulbs showed 

а mean lifetime of 1,070 hours. Test the hypothesis that the mean lifetime of the 
ance of (a) 0'05, and (b) 0°01. 


bulbs has not changed, using a level of significi 
Difference. 50 4 


SEK He 125 
50. Memory capacity of 9 students was tested before and after training. 
State whether the training was effective or not from the following scores * 
Se 26. of 8 9 


Student 1 25 «304 
QU «15-49 15. mE Wider (леч 274 
Before 1 184729 ^3 


After 12 17 8 5768 Т 
(В, Com., Bombay, 1966) 


Hint. Apply difference test t=1"36. v=, 15.04772:36.. Hypothesis 
( iad ir is true—training was not forte.) 


51. A group of 8 psychology students were tested for their ability to 
remember certain material and their scores (number of items remembered) were 


as follows : 
A B c D E T G H 
19 14 13 16 19 18 10 17 
They were then given special training purporting to improve memory and were 
té-tested after a month. 
Their scores were then : 


A BR С: р Е » G H 
29 20 7 21 23 21 21 18 


R.3:22 STATISTICAL METHODS 


A control group of 7 students were also tested and re tested atter a montn but 
‘was not given special training. The scores in the two tests were : 
J K L M N [e] P 
(D 2L 19 16 22 18 20 19 
(2): 21 23 16 24 17 17 16 


(i) Compare thechange іп еасһ ofthetwo groups by calculating t and 
test whether there is significant evidence to show the value of the special training. 


(ii) Is there experiment that the experiment was not properly designed ? 


( (i) у=7, 17:33 highly significant, v—6 not at all significant. ) 
(ii) v—13, t=2'66 and hence significant. 


52. A random sample of 27 pairs of observations from a normal popula- 
tion gives a correlation coefficient or +06. Is it likely that the variables ia the 
Population are uncorrelated ? 

t=375 ) 
v=25, fo.os=2°06, Variables in the population are correlated. 

53. Two groups of students are givea an intelligence test (x) and an 
arithmetic test (y). 

m=45 Х12у=0'45 
ng=39 Lary=0°38 
Is the difference between the values of r significant ? (1.A.S., 1967) 
vA 085 "m 
AI = 008° —0:37. The difference is not. significant. ) 


54. Two new types of rations are fed to pigs. А sample of 12 pigs is 
fed type A ration, aod another sample of 12 pigs is fed type B ration. The gains 
in weight are recorded below (in pounds) : 

Typed: 31 34 24 29 26 32 35 38 34 30 29 32 

TypeB: 26 24 28 29 30 29 32 26. 35 29 32 28 

(i) At 5% level test whether one or the other of the types is better. 

(ii) At 1% level test whether type 4 is better than type В, 


At the 5% level, test also whether the variance of the first population (1yp* 

A) is greater than 9. (B Com., Bombay, 1967) 

(t=2°3, то 6ь==2°07, 19.91 —2:82) 

55, From the data given below compute the standard error of the difle- 

rence of the two sample means and find out if the two significantly differ 
at a critical probability of 5%. 


No. items Mean S.D. 
Sample B 36 1815 3:0 


Sample A _ 25 17r0 36 
` (M.A. Econ., Gorakhpur, 1968) 


Di ul 
( Difference се =2°87. The difference in the means is significant.) 


56. The means of the random samples of sizes 9 and 7 are 196742 and 

198-42 respectively. The sums of the squares of the deviations from the mean are 
26°94 and 18°73 respectively. Can the samples be considered to have been drawn 

‚ from the same normal population ? (M. Com., Gorakhpur, 1967) 
(1—2:21, v=14, totos for v—14—2:145 No.) 


571 The incomes of a random sample of engineers in Industry I are 
Rs. 630, 650, 680, 690, 710 and 720 per month. The incomes of a similar sample 
from Industry 11 are Rs, 610, 620, 650, 660, 690, 700, 710. 720 and 730 p.m. Dis- 


QUESTIONS R-323 


cuss the validity of the suggestion that Industry І pays its engineers much better 
than Industry II. (M.A. Econ., Delhi, 1968) 
(20:1, v=14, For v=14, t9.957-2715. The suggestion is valid.) 


^ 58. A correlation: coefficient of 072 is obtained from a sample cf 29 
paired observations. Is the difference significantly different from 0°8 ? 


U A.S. 1972) 
(are AUFS , А 
SE € =0'°985. Difference is not significant. ) 


„59, The heights of six randomly chosen sailors are : 63^, 65", 58”, 69", 71” 

and 72”. The heights of ten randomly chosen solaiers are 61", 62", 65", 66”, 69”, 

69", 70", 71^, 72" and 73”. Do these heights show that soldiers аге on an average 
shorter than sailors ? (5 per cent value of г for 14 deg:ees of freedom 2:145) » 

(М.А. Econ., Gorakhpur, 1966) 

[t= 0:60 v=14, fo o5=2"145, No} 


60. (a) A random sample of 16 values from a normal population showed 
a mean of 41°5 апд а sum of squares of deviations from the mean equal то 135. 


Can it be EAMUS that the mean of the population is 43 5 ? (Use 5 per cent level 
of significance.) (B.A., Bombay, 1972) 


[г=2 67 for v=15 to oy 1753] 


(b) A random samply of 12 pairs of observations from a normal population 
gives a coefficient of correlation of 0°45. Is this value significant of correlation in 
the population ? B. Com., Delhi, 1968) 

i (t=1'594 for v=10, to.95=2 228 No) 


61. Ina certain factory there are two independent processes manufactur- 
ing the same item. The average weightin a sample of 250 items produced from 
one process is found to be 120 ozs. with а standard deviation of 12 025. while the 
corresponding figures in a sample of 400 items trom the other process are 124 an 
14. Obtain the standard error of the difference between the two sample means. 
Is this difference significant ? Also find the 99 percent confidence limits for the 
difference in the average weights of items produced by the two processes respect- 
ively. (B. Com., Bombay, 1967) 


Difference. = ug 99% confidence limits are 1:34 to 6'66 ] 


62. The standard deviation cf the wages oí 2.100 textile workers is Rs. 12 8, 
another sample of 600 textile workers gives the standard deviation as Rs. 157 
Find out if the standard deviation of the frst sample significantly differs from the 
combined standard deviation of the two samples which is Rs. 14:0. 

(М. Com., Gorakhpur, 1969) 


Difference _ V2 s. 
[тутт 0216755016 | 


63. (а) In a random sample of 64 apples taken from a large consignment 
some were found tobe bad Deduce ‘hat the percentage of bad apples in the 
consignment almost certainly ‘ies between 31°25 and 68°75 given that the standard 
error of the proportion of bad apples in the sample is 1/16, 

(B. Com., Bombay, 1968) 


(b) The mean lifetime of 100 fluorescent light iubes produced by a 
company is found to be 1,570 hours with 4 standard deviation of 120 hours. Test 
the hypothesis that. the mean lifetime of the bulbs produced by the company is 
1,600 hours against the alternative hypothesis that itis gre ter tnan 1,600 hours at 
the 5 per cent lével of significance. (B. Com , Bombay, 1968) 


Difference _ 30 
p Bu cu. i725] Р 


R-3:24 STATISTICAL METHODS 


64. (a) А random sample of 100 students gave a mean weight of 58 
kilograms with S.D. of 4 kg. Test the hypothesis that the mean weight in the 


population is 60 kg. (B. Com., Bombay, 1968) 
Difference __2_ 5 ] 
УЕ. Q4 


(b) A sample of €00 persons selected randomly from a large city gives the 
result that males are 53 percent. Is there reason to doubt the hypothesis that 


mates and females are in equal numbers in the city ? (B. Com. Bombay, 1973) 
Difference S ES е я 
SE. 732357 U47. Hypothesis is те.) 


65. А random sample of 15 from a normal population eives a correlation 
coefficient of —0°5, 15 this significant of the existence of correlation in the papu- 
lation ? [t=2'069] (M. Com., Madras, 1967) 


66. Show that in samples of 25 from an uncorrelated normal population 
the chance is J in 100 that r is greater than about 0:43, 


67. What is. the least value of гіпа random sample of 38 pairs that is 
significant (а) at the 0 05 ievel, (b) at the 0-01 level ? (0 32, 0:413) 


68. A potential buyer of light bulbs bought 50 bulbs of each of two 
brands, Upon testiny these bulbs he found that brand А had a mean life of 1.282 
hours with a standard deviation of 80 hours whereas brand B had а mean life of 
1,208 hours with a standard deviation of 94 hours, Do the two brands differ in 
quality ? (М.А. Econ., Punjab, 1968) 

Differe. 
UNS mee 4-25, Yes ) 

69. In an examination in Psychology, 12 students in one class had a mean 
grade of 78 with a standard deviation of 6, while 15 students in another class had 
а mean grade of 74 with a standard deviation of 8. Is there a significant difference 
between the means of the two groups ? (B.A., Bombay, 1970) 

(t= 1°44, v=25 t.5,—2 06 №.) 


70. The following data are obtained from an investigation ; 


Sample I Sample II 
No. of cases 400 900 
Mean wage Rs.474 Rs. 50:3 
S.D. of wage Rs, 3+} Rs. 3:3 


Find out whether the two mean wages are significantly different. 
(М.А. Econ., Delhi, 1969) 


: m 
(2 еее 2 =1@1; Yes ) 


= 


S.E. 018 


N 71. Eight pots growing three barley plants each were exposed to a high 
tension discharge, while nine similar pots were enclosed in an earthed wire cage. 
The numbers of tillers in each pot were as follows : 


Caged 17,27, 18, 25, 27, 29, 27, 23, 17 
Electrified 16, 16 20, 16; 20° 17; 155 201 


Find out whether electrification exercises any real effect on tillering, - 
(B. Com., Bombay, 1966) 
[1—3:05, v—15 го 95 =2'13 Yes] 
The ee, (a) A sample of 10 is drawn at random from a normal population. 
sum of squared deviations from the mean is 50. Test the hypothesis that the 


í variance of the population is 5. Find a 95% confidence interval of the population 
Vide (М.А. Econ., Delhi, 1971) 


12763 to 18°52] 


QUESTIONS R-3:25 


(b) The following data relate to а random sample of government €M- 
plovees in two States of the Indian Union : Hes P 


State 1 State II 
Sample size 16 25 
Mean monthly income (Rs.) of 
the sample employees 440 460 
Sample variance 40 a 


First, carry out a test of bypothesis that the variance of the two populations are 
equal. In the light of the result of the above test, carry out a test of the hypothesis 
that the means of the two populations are equal, (M.A., Econ., Delhi, 1971) 

\Е=1`05, (—973] 


73. A random sample of 400 members is found to have а mean of 4°45 
cm. Canit be reasonably regarded asa sample froma large population whose 
mean is 5 cm. and variance is 4 ? (B. Com., Bombay, 1969) 

А 74. Tworandom samples give the following information di 
weights in pounds of a group of boys and girls ; тергаш fe 


R Boys Girls 

Size of the sample 30 625 

Mean 145 150 
30 2:5 


Standard Deviation 
Do you think the two populations differ ? You may assume that the ponulations 
are normal. [Dift./S.E.—0:55/01—5:5] (M.A. Econ., Delhi, 1971) 
75. Two random samples drawn from normal populations are : 
1..20, 165 ын Об 027, 23, 22; 155.231 1/25. 19 
пот ias qo as ME 28," 1 "43, 130, ^37 
Obtain the estimates of the variances of the populations. Test whether the two 
populations have the same variance. Fer 
[F=214] — (B.A., Bombay. 1970) 
ofallsingle women hired for 
two years after they are hired. 
t5% level of significance, if among 200 such secretaries, 


weekly food expenditure is Rs. 2. 
women shoppers chosen at random in sup! rket / 1 
city, the average weekly food expenditure is Rs. 220 with a standard deviation of 
Rs, 55. Testat 1% level of Sid i 
tures of the two populations of shoppers are equal. 

Boe (М. Com., Delhi, 1969) 


ate the proportion of illiterate persons in a group 
dom sample of m units. What should be the 
population proportion can be determined with 


78. (i) We want to estim: 
of 10,000 persons by а simple rani 
minimum sample size so that the 
an error less tnan 0 019. 

(ii) In а large population, mean is equal to 64 and standard deviation 
equalto 2:5. What should be the minimum sample size SO that the sample mean 
may not differ from the population mean by more than 0:1 in 95% of «amples. 


B.A., Bombay, 1970) 
79. Justify the following statements : 


a sample of size 200 from a 


п mean. 
а sample of size 200 from a 


(i) For estimation of populatio 
accurate as 


Population of 10.000 units is almost as 
population of 1,00,000. 
(ii) For estimation of. populatioa mean ре sampie size will have to be 
m Д 1 се the error to Пан. 
ade four times in order to redu (B.A., Bombay, 1972) 


R-326 STATISTICAL METHODS 


80. (а) A random sample of 10 boys had the following I.Qs. : 
70 120, 110, 101, 88, 83, 95, 98, 107, 100 
Do these data support the assumption of a population mean 1.Qs, of 100 ? 


(v=9, (—0 62) 
(b) Find the 95% fiducial limits of the Population mean corresponding 
to (a) above. (97°2 + 10:2) 


(c) Another sample of 10 boys had the following I.Qs. : 
65, 70, 75. 80, 92. 87, 85, 92, 93, 88 
Can this bea random sample from the same normal population as that from 
which sample of (а) was drawn ? 
(t=2'64 for v—18, 15.05—2:10) 


81. You are supplied with the following data : 
пү=27, n4—33 
7,04, re=0 6 
Are the values ғ and ra significantly different ? 


(22: = 0268 -0°98 No) 
S.E. 027 
82. Two independent samples have 28 and 19 pairs of observations with 
correlation coefficient 0°55 and 0°75 respectively. Are these values of r consistent 
with the hypothesis that the Samples are drawn from the same population 2 
(M. Com., Delhi, 1972) 


( Zi—Z, 0355 


SE. 032 


83. Опе make of a'motor car developed engine trouble in 5 races out ofa 
total of 100 races and another make of motor car of the same year developed 
engine trouble in 5 races out of a total of 200 races, Find out if there isa sigaiti- 
cant difference in the two types of motor cars so for as engine defects are 
concerned, [Diff/S.E.—1:136] (M.A. Econ., Punjab, 1971) 


- Ina simple sample of 600 men from a certain large city, 400 are found. 
to be Smokers, in a sample of 90) from another large city 450 are smokers. Do 
the data include that the cities are Sigciticantly different with respect to the preva- 


lence of smoking among men ? ; 
g B IDiff/S.E.— 641] (M. Com., Meerut, 1971) 


. 85. A normal population is supposed to have mean —3 cm. and standard 
deviation 2:31 ст. A sample of 900 members is found to have a mean of 3:24 
cm. Can it be reasonably regarded as a simple random sample of the population ? 

6, IDiff/S.E.—3'1 (M. Com., Meerut, 1971) 

86. (a) A random sampie of 200 villages from Nagpur district gives tbe mean 
Population per village at 485 with a standard deviation of 50. Another random 
Sample of the Same size trom the same district gives the mean population per 
Village at 510 with a standard deviation of 40. 


.o0. T8 the difference between the mean values given by two samples Statistically 
significant? Give reasons for your answer. (M. Com., Nagpur, 1971) 


(b) In a random sample of 1,000 persons from town А, 40% were found to be 


consumers of rice. In another random sample of 100 perso 6 
ns from town B, 509, 
were found to be consumers of rice. Р 55 


Do these data reveal! a significant difference in the roportion of rice 
consumers in these two towns ? [Diff/S.E.— 1:94] M. Com INOR, 1972) 
87. The samples were drawn independent]: i 
e ADI A À pendently from two normal populations. 
7:—15, X17 602 E(Y— %)2=5612 
73719, Xe=61 8 Z(Xsj— Y3)2—83 27 


Тебе whether the population variances are equal. Obtain a pooled 
estimate of the variance, (M.A. Ecgn., Delhi, 1973» 


[F—2:42, Pooled variance— 4:356] 


QUESTIONS R-3:27 


. 88. A manufacturer claimed that at least 90 per cent of the machine parts 
that it supplied to a factory conformed to specifications. An examination of 200 
such parts revealed that160 parts were not faulty. Determine if the manufactu- 
rer's claim is legitimate at the 1 per cent level of significance. 
[Diff/S.E.—4-72] (M. Com., Delhi, 1975) 
89. Two random samples drawn from two normal populations are : 
Sample I: 20 16 26 27 23 22 18 24 25 19 


Sample II ; (КАКЕЙ? X 250m aRY) Р: 18 31 33 
20 27 


Obtain estimates of the variances of the populations, and test whether the 
two populations have the same variance. Use F-test. (М. Com., Meerut, 1975) 
' 


SECTION 4 2141 
x! TEST AND GOODNESS OF FIT 


М 1. What is x? test of goodness of fit, What pre:autions are necessary їп 
using this test ? (M.B.A., Delhi, 1971) 


4 2. Describe the X? test of significance and state the various uses to which 
it can be put. (M.A.Econ., Delhi, 1968) 


3. Illustrate with examples the usefulness of X? test as a test for indepen- 


dence. (M. Com., Delhi, 1972) 


4. Describe the use of the X? test in testing independence of attributes 
їп а 2х2 contingency table. (B.A., Bombay, 1970) 
5. Explain how the chi-square distribution can be used (i) to test the 
goodness of fit, and (ii) to test the independence of the cell-frequencies in a 2x2 
contingency table. (LA.S., 1965) 
6. Explain how the X? distribution can be used for judging the agreement 
between a hypothetical and an observed distribution, Show how the degrees of 
freedom are determined in different circumstances. (M.A, Econ., Delhi, 1970) 


3 Discuss briefly the use of %# test as a test of goodness of fit. State the condi- 
tions to be satistied for the applicability of the test. (B.A., Bombay, 1972) 


7. (a) Describe tne X? test of significance and state the various uses to 
which it can be put. (M.A. Econ., Delhi, 1968 ; M.A., Punjab, 1971) 


(b) Discuss the chissquare test of goodness of fit of a theoretical 
State the conditions for 


distribution to an observed frequency distribution. Т 
the validity of 7? test. (M. Com., Delhi, 1975) 
8, The following table s 

wherc credit squeeze is in operation ап! 


hows price increases and decreases in markets 
а where it is not in operation : 


Credit squeeze Price decrease Price increase Total 

10 operation 862 10 872 

Not in operation 582 18 600 
Total 1,444 28 1,472 
Find whether the credit equeeze has been effective in checking price 


(M.A. Econ., Delhi, 1968) 


( 31— 6:57 ; Xaos for v=1=3'841 ] 
L Credit squeeze is effective in checking price increase. 


increases. 


9. The following table shows the result of inoculation against cholera im 


a certain tea estate : 
Not attacked Attacked Total 
Inoculated 267 37 304 
Not inoculated 757 155 912 


102 192 1,216 


R-3:28 STATISTICAL METHODS 


Find out whether there is any significant association between inoculation 
and attack, given the following value of x2 (P=005) : 


Degrees of freedom 1 2 3 4 
Value of x2 3 841 5:991 7:815 9:488 
(М.А. Econ., Delhi, 1961 ; M. Com., Nagpur, 1968 ) 
X2—3*99 : Хо; for v—1— 3:841 ) 
Inoculation and attack are associated. 


10. The following data relate to the sales in a time of trade depression, of 
а certain proprietary article in wide demand Do the data suggest that the sales 
are significantly affected by the depression ? 


Districts where Districts not hit Districts hit Total 
sales are by depression by depressian 

Satisfactory 250 ~ 80 330 

Not satisfactory 140 30 170 

Total 390 110 500 


[33—2:84 ; X15.9,—3'84 ; Hypothesis holds good,} 


li. Two sample polls of votes for two candidates А and B for a public 
office are taken, one from among residents of urban areasand the other from, 
Tesidents of rural areas. The results are given below, Examine whether the 
Nature of the area is related to voting preference in the election. 


| Rural 620 380 1000 | 
Urban ‚550 450 1000 | 
Total 1,170 830 2,000 


(2*-10*09 ;Table value of x2 at 59 level 21-384 
{ Hypothesis does not hold good, h ido ) 


12. In the course of anti-malarial work quinine was administered 10 €06 
adults out of a total population of 3,540. The incidence of malarial fever is 
Shown below, Discuss the preventive value of quinine. 


Fever No fever Total 

Quinine З 19 ТЕ Ие коб УЛ ОООЙ 
No Quinine 193 2,741 2.934 
Total 212 3,326 3,540 


(M. Com., Aligarh; 1973) 
[X2—10:58 : х20,05 for y= 1=3'84, Quinine is preventive, | 


aus: An experiment was conducted to test the efficacy of chloromycin in 
checking typhoid. Іп а certain hospital chloromycin was given to 300 out of 400 
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patients suffering from typhoid. The number of typhoid cases is as follows : 


Typhoid No Typhoid Total 
Chloromycin 40 260 300 
No Chloromycin 60 40 100 
Total 100 300 40 


Test the effectiveness of chloromycin in checking typhoid. 
[4*5 871 ; Х®%.ов for v=1=3'871. Chloromycin is effective in checking typhoid.) 


. 14. 200 digits were chosen at random from a set of tables. The fie- 
quencies of the digits were : 


Digit 0 1 ueni Р 6-1 8 9 Total 
Frequency 18 19 23 21 16 25 22 20 21 15 200 


Use the 73 test to assess the correctness Of the hypothesis that the digits 
were distributed in equal numbers in the tables from which these were chosen. 
(M.A, Econ. , Punjab, 1969) 


X=¥3 ; for v=), X'o:5=16 92. Hence th» hypothesis seems reasonable.) 


15. in an experiment on pea breeding Mendel obtained the following 
frequencies of seeds ; 315 round and yellow ; 101 wrinkled and yellow; 108 round 
and green ; 32 wrinkled and green, Theory predicts that the frequencies should 
be in the proportions 9 : 3:3:1. 


Examine the correspondence between theory and experieant. 
(X#=0°47 5 for v=3 X%,9,=7'82) 


16. Fora particular experiment given on hypothesis H, 22-9, N=8 when 
repeated it gives the same result. Show that the two results taken together do 
not give the same confidence in H as either taken separately, 


17. A die is thrown 132 times with the following results : 


Number turned up 1 2 3 4 5 6 
Frequency 1651609: 1:255 **147^ 29). 28 


Test the hypothesis that die is unbiased. 
(02=9 ; x%.05 for v=5=11'07. Hypothesis holds good.) 


18. Fita Poisson distribution to the following data and apply the X? test 
of goodness of fit : 


x 0 1 2 3 4 Total 
#5 17,167 1,861 124 2 1 19,155 
jp Expected) еу 


|X 3 4 
]f 17.150 1,896 105 4 0 
22-51 For ¥=3x7o08=7°815 

Fit is good. J 


19. Ofthe 500 workers іп a factory exposed to an epidemic, 350 in all 
were attacked, 200 had been inoculated and of these 100 were attacked. Set out 


these data on a table, describe the method oftesting association and find whether 


in the given case, the association is or is not significant. using the following values 
gi " (M. Com., Calcutta, 1968). 


of chi-: =0' if necessary. 
i-square (p—0'05), if necessary, 02-675; for v=, Xans 23841] 


1 
| 
| 
| 


20. Among 64 offsprings of a certain cross between guinea pigs, 34 were 
rei, 10 were black and 20 were white. According to the genetic model these 
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numbers should be in the ratio 9: 3:4. Are the data consistent with the model 
at the 5 per cent level ? (You are given that the value of X? with the probability 
0:05 being exceeded is 5:99 for v—2 and 3:84 for v—1.) (1.A.S., 1965) 


Calculated value of X*—1'444. Since it is less that 5°99 the data is ) 
consistent with the model. J 


21. The following information was obtained in a sample of 50 small 
general shops : 


x Shops in IRE 
Towns Villages Total 
Run by men 17 18 35 
Run by women 3 12 15 
Total _ 20 30 50 


„Сап it be said that there are.relatively more women owners of small general 

shops in villages than in towns ? 
(Use X? test, The 5% value of X? for у=1—3:141.) 
(M. Coml, Meerut] 1969) 
[32—2 48. Hypothesis hold good.) 
22. A certain drug is claimed to be effective in curing cold. In an 
experiment ou 160 persons with cold, half of them were given the diug and half of 
them were given sugar pills. Ihe patients! reactions to the treatment are re- 


corded in the following tanle z 


Helped Harmed No effect 
Drug 32 10 18 
Sugar pilis 44 1U 26 
aest the hypothesis that tue drug is no better than the sugar pills lor 
curing оон. Р 
(ine 5% value of X? for у=2 is 5'991) (M. Com., Meerut, 1969) 


(X8=2'12. Hypoihesis hold good; 
23. You are given the following data : 


Boys Intelligent Unintelligent 
Fathers boy boy Total 
Skilled Father 24 Е? Зр 
Unskilled Father 32 32 04 
Total 56 44 100 


Do these figures support the hypothesis that skilled fathers have intelligent 
boys ? (М. Com., Calcutta, 1966) 
(A= 2:6 ; for v1. X*9.05—3:84| 

24. A die is thrown 180 times with the following results : 


No. turned up 1 2 3 4 5 6 
Frequency 25 35 40 22 32 2b 


Test the hypothesis that the die is unbiased. 
(2=7°78 ; v—5, х20.05= 11:07) 
25. (a) Criticise the argument “99 per cent of the people who drink 


beer die before reaching 100 years of, age, Therefore drinking beer is bad for 
longevity 2^ 
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(6) In a recent diet survey the following results were found in a city : 


Community A Community B 
No. of енен 
taking tea 1,236 64 
Number of families ! 
not taking tea 564 36 
. , Discuss whether there is any significant difference between the two com- 
munities in the matter of tea-taking. (M.A., Econ., Punjab, 1972) 
The difference is significant.) 


1315238 ; for v1, Х%0.05=3°841. 

26. Two treatments А and B were tried to contro) a cert: 
disease, The following results were obtained : 

A : 400 plants were examined and 80 were found infected 

В : 400 plants were examined and 20 were found infected 

18 treatment В superior to treatment А. 


ain type of plant 


pa-4rM ; vl 20:05 3'84] 


27. The following figures show the distribution of digits in numbers 


chosen at random from a telephone directory : 
Digit 0 1 2 3 4 5 6 7 8 9 Total 

Frequency 1,026 1,107 997 966 1,075 933 1,107 972 964 853 10,000 
Test whether the digits may be taken to occur equally frequently inthe — 
directory. (M.A., Econ,, Delhi, 1972) 


[335 58:542 ; v=9, x*5,0,7716919) 

28. On the basis of the information given below about the treatment of 
200 patients suffering from a disease. state whether the new treatment is com- 
paratively superior to the conventional treatment: 


Treatment No. of Patients 
Favourable No. 
Response Response 

New 60 20 

Conventional 70 50 
(М. Com., Rajasthan, 1969) 
xt-586 \ 
( y =1 x39 9523784: Yes J 


29. Two treatments were tried out ina control of a certain type of plant 


infection, and with the following results : 
Treatment A: 2 0 plants examined and 24 found infected 
Treatment B : 200 plants examined and 9 found infected. 
May we conclude that treatment B is superior to 'reatment A in controlling 
(B. Com., Nagpur, 1969) 


this type of infection ? 3 
(135645, vol, X55 05 = 3°84 : Yes) 


‚30, From the following table test whether the colcur of the son's eyes is 
associated with that of the father's : 


Eye colour in sons 


Eye colour in 


father 
Light Not light Total 
Light 230 148 378 
Not light 151 471 622 
Total 381 619 1,000 


SMRE—10'77-12 
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R-332 
You may use the following values of X? ; 
Degree of freedom 5% values of x? 
1 3:84 
2 5:09 
3 T82 


(M. Com., Nagpur, 1970) 

[P= 1332, v—1, Xo 953 84; Yes] 

31. From the following data, find if there is any association between 
inoculation and absence of attack of typhoid : 


Attacked Notattacked Total 

Inoculated 12 674 686 
Not Inoculated 47 1,122 1,169 
Total 59 1,796 1,855 


(M. Com., Gorakhpur, 1968) 
[32 7:25, v—1, ¥%9.95=3'45 ; Yes) 
32. Test the association between extravagance in fathers and extravagance 
in sons from the following data : 
Extravagant fathers with extravagant sons=327 


Miserly fathers with extravagant sons =741 

Extravagant fathers with miserly sons =545 

Miserly fathers with miserly sons =234 
(Take Р(х22>3"84) 20:95] 


(B. Com., Madras, 1970) 
[X= 279:76, v—1, Х20 9s=3'84. There is association.) 


33. Given the following contingency table for hair colour and eye colour : 


| Hair colour 
Eye 
CHOSE EC xc mac ecco um d 
| Black Fair | Brown Total 
| — — 
Brown 10 2 32 64 
Blue | 15 28 29 72 
Grey 25 20) эи? 64 
Total 50 70 80 200 


What can you infer about the association between the hair colour and eye 


colour ? 
[2= 11°69, v=4, X*5.95--9:49. There is association) 


= 34, The figures given below are (a) the theoretical frequencies Of a 
distribution, and (b) the frequencies of the normal distribution having the same 
mean, standard deviation and the tota! frequency as in (а): 

(a) 4 JS XE ОИ 201760 270. 223.35110..529 5 
(5) 6 31 107 26 20 216 107 31 6 
Test the goodness of fit by applying a suitable tést criterion. 
(M.A. Econ., Delhi, 1968) 
(ES1173, v=4, 1*50,—949. Fit is good, 
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35. Inan anti-malarial campaign in a certain area, quinine was adminis- 
tered to 812 persons out of a total population of 3,248. The number of fever cases. 
is shown below : 


Treatment Fever No Fever 
Quinine 20 792 
No Quinine 220 2216 
Discuss the usefulness of quinine in checking malaria. 
[:2238:40] (M. Com., Nagpur, 1972) 
36. 4coins were tossed 160 times and the following results were obtained + 
No. of heads Observed frequencies 
0 17 
1 52 
2 54 
3 31 
4 6 


/ Under the assumption that coins are balanced, find the expected f " 
cies of getting 0, 1, 2, 3 or 4 heads and test the goodness of fit. n FEED 


[2=12:725] (M. Com., Delhi, 1970) 
.. 37. 1,600 families were selected at random in a city to test the belief that 
high-income families usually send their children to public schools, and low income 


families often send their children to government schools. The following results 
were obtained ; 


School 
пасоше dE m Se ee Total 


Low 491 506 1000 
High 162 438 600 
Total 656 914 1600 


Test whether income and type of school are independent. 
[13277718] (M. Coml, Delhi, 1971) 


38. Do the following data suggest any association between attributgs 4 а 
B? Use 5% level of significance. ji MW 


A Not A Total 
> B Ё. 10 8 18 
Not B 6 26 ee urs 
Total 16 34 50 $ 


.. (b) Children having one parent of blood-type M and the other blood type 
у will Шилу Doone of the three types M, ММ and N and average proportions of 
these wi Did IS 
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Out of 200 children having one M parent and one N parent, 30% were 
found to be of type M, 45% of these MN and the remainder of type №. Use х2 
to test the hypothesis, {(a) $5558, (b) y2=4'5) (B.A., Bombay, 1969) 


39. A die was thrown 120 times and the following frequency distribution 
was observed : 


Face 1 2 3 4 5 6 
Observed frequency 25° 17 15 23 24 16 


Find the expected frequencies on the hypothesis that the die is fair! Using 
a significance level of 5% test the goodness of fit. [25]  (B.4., Bombay, 1971) 


40. A movie producer is bringing out a new movie. In order to map out 
his advertising campaign, he wants to determine whether the movie will appeal 
most to a particular age group or whether it will appeal equally to all age groups. 
The producer takes a random sample from persons attending a preview showing of 
the new movie, and obtains the following results : 


Age groups 
Under 20 20—39 40—59 60 & over 
Liked the movie 320 80 110 200 
Disliked the movie 50 15 70 60 
Indifferent 30 5 20 40 
How do you think the firm should conduct its advertising campaign with 
respect to this movie ? [325797 


41. A certain type of surgical operation can be performed either with a 
local anaesthetic or with a general anaesthetic. Results are given below :+ 


Alive Dead 
Local 511 24 
General 173 21 


Test for any difference in the mortality rates associated with the different 
types of anaesthetic, What qualification would have to be imposed on any con- 
clusion from. such data as to the desirability of one or other method anaesthesia ? 

(M.A., Econ., Delhi, 1972) 


(Be 981 e 0:002) 


42. Five hundred students at school were graded according to their intelli- 
gence, and the economic conditions of their homes. Examine weather there is 
any association between economic conditions at home and intelligence. 


Intelligence 
Economic "4a. LEE 
conditions 
Good Bad 
Rich 85 75 
Poor 165 175 


(B.A., Bombay, 1973) 
(%2=0°92 ; No} 


43. The theory predicts the proportion of beans, in the four roups 4, B, 
c and D should be 9:3:3:1. Inan experiment ameng 1,600 Api amber 
in the four groups were 882, 313, 287 and 118. Does the experimental result 
support the theory ? (The table value of 72 for 3d fat 5% level of significance 
T81) > (M.B.A., Delhi, 1975) 


b3—47726] 
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SECTION 5 


ANALYSIS OF VARIANCE с 
‚1. (а) What is analysis of variance? Explain clearly the technique of 
analysis of variance for data with one-way classification. 
(B. Comj, Bombay, 1968) 
(b) Explain, with illustrations, the analysis of variance technique. 
(M. Com., Gorakhpur, 1966) 
2. Describe the technique of analysis of variance for a two-way classifica- 


tion. (B. Com., Bombay, 1966) 
3. Discuss the fundamental principles of analysis of variance with special 
reference to the assumptions made therein, (B. Com., Bombay, 1967) 


4. (a) Decribe the technique of the analysis of variance, Write down 
the analysis of variance table for a one-way lay-out dealing with homogeneity of 


data relating to k groups. (B. Com., Bombay, 1968) 
(b) Explain briefly the technique of analysis of variance for either one- 
way or two-way classification data : (B.A., Bombay, 1970) 


5. Toassess thesignificance of possible variations in performance in a 
certain test as between the grammar schools of a city, a common test was given 
to students taken at random from the senior fifth form of each of the four schools. 
Carry out analysis of variance and comment on the results : 


Schools Marks obtained by the Students 
A 8 7 4 5 5 » 6 6 7 
B 7 5 S 4 3 4 6 
с 5 3 4 4 3 5 4 4 
D 10 5 6 4 8 7 8 8 
{F=7:05] (B. Com., Bombay, 1967) 


6. Test the hypothesis that there is no difference between the mean 
mental ages of the schools on the basis of the mental ages of 6 random samples of 
6 students each taken from 6 different schools as given below. The 10% value of 
F for v4—5 and уз =30 is given to be 3-7. 


Individuals Mental Ages in Schools 
1 2 3 4 5 6 
1 158 156 160 169 164 153 
2 157 155 158 155 163 148 
3 153 154 156 148 162 145 
4 151 153 155 147 160 144 
5 144 151 150 146 154 144 
6 143 149 145 145 151 136 
Меап 151 153 154 150 159 145 
{Е=3`36] (M.A. Econ., Lucknow, 1972) 


7. The following table gives the retail prices of a commodity in some 
shops selected at random in four cities ; 


City Prices (im Rs. per Ib.) 
А 22 24 27 23 
в 20 19 23 — 
c 19 17 21 18 
D 24 25 29 26 
Carry out the analysis of variance to test the significance of the difference 
between the prices of the commodity in four cities. (B. Com., Bombay, 1966) 


8. Опа swine-feeding experiment the following results were obtained. The 
three rations A. B and C differed in the substances providing the vitamios. The 
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animals were in groups of 3 each, the grouping being on the basis of litre and 
initial weight. It may be assumed that the grouping is a matter of replication only. 


Group 
Gain in weight 
Total 
Ration d 2 2 : 
A 70 160 10:5 135 470 
B 140 15-5 15°0 210 65:5 
с | 85 16:5 95 135 48'0 
29:5 480 350 48'0 


luem m b a D ae LLL SNL LENT BEC MI e e ышы К. 
Carry out the analysis of Variance for the above data. (В. Com., Bombay, 1968) 


9. Four machines produce steel wire, the following data give the 
diameters at ten positions along the wire for each machine. Examine by perform- 
ingthe Analysis of Variance, whether the machine means can be regarded as 
constant, ` 


Machine Diameters in thousandth of an inch 


A 1232 13» «16 465 14, I5 15 16 7 
By PAD ytd ЈАТ _ 16 465 18.7 у UIS 
СТЬ 2517 143197 482 37-2 17. . 16 , 15 
D.23 21.25. 21. 96 4 272 24. 029 21 


[F—32:8] (M. Com., Gorakhpur, 1969) 


„10. Apply the technique of Analysis of Variance to the following data, 
relating to yields of 4 varieties of wheat in 3 blocks : 


Varieties же; > 

ee ee eee © ү 

1 2 3 

Ц 10 9 8 

П 7 7 6 

HI 8 5 4 

IV 5 4 4 
f EM лыска Екшин: И АШ ЕШШ ЕСО ЧЫНЫ з 
LF-778] (B. Com., Bombay, 1967) 


ll. A certain company had four salesman A, B, C and D each hom 
was sent for a month to three types of area— Countryside оние or a city 


Озше shopping centre of city S, The sales in hundreds of rupees per month are 


M a, 
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Salesmen 
Districts 
A B С р 
K 30 70 30 30 
о 80 50 40 70 
5 100 60 80 80 


Carry out an analysis of variance and interpret the results. 
[F—0:28] (B. Com., Bombay, 1970) 
12. The following table represents the yield of wheat in bushels per acre 
* for trial plots of land treated with four different levels of fertilizer, Each level 
was applied to five plots randomly chosen over a field : 


Treatment 
Plot Number 
1 2 3 4 
1 21 24 34 40 
2 25 33 26 47 
3 31 34 38 39 
4 17 39 32 41 
5 26 35 35 33 


Carry out an analysis of variance and state your conclusions. 
[F—8:3] (B. Com., Bombay, 1970) 
13. Four experiments determine the moisture content of samples of a 


powder, each man taking a sample from each of six consigaments. Their assess- 
ments are 


Consignment 
Observer 
1 2 3 4 5 6 
1 9 10 9 19 11 11 
2 12 ll 9 11 10 10 
3 11 10 10 12 п 10 
4 12 13 il 14 12 10 


Perform an analysis of variance on these data and discuss there is any 
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significant difference between consignments or between observers. 
f F (between J lage 

F (between observer) 
L =201 


14. Four different drugs have been developed for a certain disease, These 
drugs are used in 3 different hospitals and the results given below show the 
number of cases of recovery from the disease per 100 people who have taken the 
drugs. 


Ay En Аз En 

Bi 19 8 23 8 

В, 10 9 12 6 

Bg m 13 13 10 
What conclusions can you draw ? [F—5:39] 


15. Explain briefly the technique of analysis of variance for data with one- 
way classification. 


(b) The following figures relate to produetion in kg. of three varieties A, 
B and C of wheat sown in 12 plots : 


A 14, 165 18 
B 15,013): | оэ 
(S 18, 16 19 19 20 


Is there any significant different in the production of the three varieties ? 
[F=1-13] (B.A., Bombay, 1969) 


16. Samples of pigs of each of four types were fed on the same ration 
over a period, The figures in the table denote increase in weight in Ib, per pig at 
the end of the period, Do the sample means differ significantly ? 


Type of Pig 
B € 


A D 
61 146 irs 134 
138 15:7 16-0 202 
87 118 90 12'9 
120 16'5 133 12°5 
Е= 1:894 
For y1—3 уз 12, Еу 052-49 
Sample means do not differ significantly 


SECTION 6 
DESIGN OF EXPERIMENTS 


1, Whatisa Latin Square? Point out its significance and limitations, if 
any. 


2. Explain with illustration the Procedure of constructing a Latin square, 
3. What is a randomized block design. 
4. Write short notes on : 
(i) Latin square or randomized block design. 
(ii) Design of Experiments, 
(iii) Factorial Experiments, 
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5. The following data represent the number of units of production per 
day turned out by 5 different workmen using different types of machines : 


_ (a) Test to see whether the mean productivity is the same for the 4 different 
machine types. 


(b) Test to see whether the 5 men differ with respect to mean productivity, 
Machine type 
B 


[oj D 

Workmen 1 44 38 47 36 
2 46 40 52 43 

3 34 36 44 32 

4 33 38 46 33 

iS 38 42 49 39 


(1.А.$., 1968) 


f Between Machine type, F=17-75 

7 Workmen, F= 158 
| There is significrant diffeence in the mean productivity of | 
{ machines. J 


6. Tte following table gives wheat yields (bushels acre) for five fertilizer- 
treatments of plots arranged in a Latin square. Test the significance of column, 
row and treatment effects at 1 per cent level 


Columns 
Rows 
1 2 3 4 5 
1 34 (C) 21(4) 52 (Е) 24(В) 40 (D) 
2 33(B) 45 (E) 47(D) 26 (С) 25 (A) 
3 31 (A) 38 (C) 34 (B) 39 (D) 38 (E) 
4 44 (E) 4t (D)  3(C) 17(4) 39 (B) 
5 33 iD) 35(8)  26(A) 46(E) 35 (C) 


( Between Columns 1:428 Effect not significant) 
prx Rows 0°08 М, T T | 
| ,, Treatment 111 Effect significant | 
bL. Error (Mean square) =28'5 J 


7. Three varieties A В. C of wheat are tested in a completely randomized 
design, with the replications, the layout and the yields being shown in the 
following table Set up the analysis Of variance table. and test whether the 
varieties differ significantly among themselves with regard to yield. 


A20 R18 A16 C25 C22 
C28 A21 B 20 A20 B17 
B15 C32 В 25 С 28 A23 
[F=8'15) (M. Com,, Meerut, 1973) 
SECTION 7 


STATISTICAL QUALITY CONTROL 


1. What do you understand by statistical quality control ? Point out its 
usefulness in industry. U.C. W.A., July, 1973) 

2. How does the statistical quality control help you їп industry? 
Describe the procedwe for drawing a control chart during production and 


indicate how уси detect lack of control in the production process. 
U.C. A., Jan., 1967) 


3. Discuss briefly the need and utility of statistical quality control in 
industry. (M.B.A., Delhi, 1969) 
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4. (a) What is a controlchart ? Describe how a control chart is con- 
structed and used. (LC.W.A., Jan., 1965) 


(b) Explain the difference between control limits and tolerance limits, 
5. (a) What is meant by process control in industrial statistics ? 


(b) Explain clearly the construction ard function of (i) Y chart, and (ii) 
R:chart. (LC.W.A., July, 1966) 


6. Discuss the basic principles underlying control charts, Explain in 


brief how contro! limits are determined for (i) P-cbartkii) a C-chart. 
(M. Com., Bombay, 1969) 


7. Distingush between process control and product control. State the 
different types of acceptance sampling plans explainirg their merits and demerits, 
(M. Com., Delhi, 1971) 
8. (a) Explain the principles on which a control chart is based. 
i e oa лы \ (1.C.W.A., July, 1967) 
(b) Discuss the ro'e of control charts in manufacturing process. 
(B- Com., Bombay, 1967) 


9. When is a manufacturing process said to be in a state of statistical 
i control? Discuss the importance of controi charts in manufacturing industry. 
(B. Com., Bombay, 1968) 


10. Describe briefly the working of the p-chart. 
(B. Com., Bombay, 1966) 


11. Distinguish between specification limi imi 
ate : i mits, control limits and confidence 
limits, What do you understand by a rational sub-aroup ? Discuss its impor- 
tance in setting up control charts for variables and also for attributes, 


(LC.W.A., Jan , 1966) 


12. Write a brief note оп the method of constructin 
At g control charts for 
X and R, giving the formulae for the upper and lower control limits in both the 
cases, ten wr (M.Com., Meerut, 1970) 
3. (a) What are the considerati i isti i 
CDU pedi natal Bau rations for introducing statistical eyuality 
(b) Describe the construciion and use of X and R chart. 
bns (L.C.W.A., July, 1970) 
i - "Quality control is attained most effici 
; ШУ . ciently, of cource, not by the 
ie торкип itself but by getting at the ERU Comment on ‘the VES 
рар Ti f the various devices employed for the maintenance of quality in a 
w of manufactured products, (B. Com., Bombay, 1965) 


15. From a machine se: i 
ne set up to produce glass beads each of weight 2:0 
Bm зере ors beads is taken every two hours and the data obtained are 
g control chart for the mean weight and examine whether the 


Process is under co; 5 
for the mean is og trol ог not. You are given that for n=5, the coatrol factor 


1 2 3 4 5 6 7 8 9 10 
gram: "0 2 к 
E з 200 210 245 209 204 205 200 215 217 PW 


grams 010 025 030 020 0:20 Е 
025 020 030 025 115 
[C.L.—2, U.C.L-— 21856, L.C.L.—1:8144] 4B. Com., Bombay, 1968) 
(B. “5 , 


/———————— 
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16. Construct a suitable control chart for the following data and state 
your conclusions. 


Sample No. 1 QU go WEARS COT 8 905330 
(each of 10 items) 

No. of defectives 12 10:5 7671.87 2917 Ж TENE Omelet 8 
[CL-9, UCL=17°58, LCL—0:42] (B. Com., Bombay, 1967) 


17. From a factory producing metal sheets, а sample of $ sheets is taken 
every hour and the data is obtained as under. Draw a control chart for the mean 
and examine whether the process 1s under control or not, You are given that for 
n=5, 453—058. 

Sample No. 1 2 3 4 5 6 7 8 9 10 
Mean 


thickness 
of sbeet 0255 032 042 "022 +028 010 .025 "0:0 “026 :029 
Sample 
Range "025 048  :012  :012 019 010 006 -046 010 032 
[CL —0:0279, UCL=0'0407, LCL —0'0151] (B. Com., Bombay, 1967) 


i 18. In a manufacturiog concern producing radio transistors, lots of 250 
items are inspected at a time. Considering the number of defectives in 20 lots 
shown in the table below, draw a suitable control chart and write a brief report 
based on the evid:nce of the chart. 


Defectives in each lot 


Lot No, 1 2 3 4 5 6 7 8 9 10 
No, of defectives 25 47 23 36 247 3439 130103537 22 
Lot No. il 12 13 14 15 16 17 18 19 20 
No. of defectives 45 40 32 35 2 40 15177287923 102 
(CL=31'9, UCL—85:14, LCL—0] (B. Com., Bombay, 1966) 


19. (a) In the production of certain rods a process is said to be under 
control if the outside diameters have a mean of 2,500 inches and а standard 
deviation of 0 002 inch. Construct a control chart for the mean of random samples 
of size 4, showing the central line, the upper control limit and the lower control 
limit on graph paper. [CL=2,500. UCL—2500:003, LCL —2,499:997] 

(b) Contruct a control chart for the proportion of defective obtained 
in reapeated random samples of size 100 from a process which is considered to be 
under control when the time proportion of defectives P is equal to 020. Draw 
the central line and the upper and lower control limits on graph paper. 
[CL—020, UCL=0:32, LCL—0:08] (M. Com, Meerut, 1969) 


20. The following data shows the number of defects per group іп 20 suc- 
cessive groups of 5 radio sets each : 


Group No. of defects per Group No. of defects per 
group group 

1 77 п 59 

2 64 12 54 

3 75 13 41 

4 93 14 89 

5 45 15 40 

6 61 16 22 

7 49 17 92 

8 65 18 89 

9 45 19 55 
10 77 20 25 

Plot the control chart for number of defects per group cf 5 radios, and 

comment on the findings. (B. Sc., Sardar Patel, 1969) 


[CL—60:85, UCL —8425, LCL —37:45] 
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21. The following table gives the daily number of articles inspected and 
tbe number of defectives for a machine : 


Sample No. _ No. Inspected No. of | Sample No. No. Inspected No, of 


defectives defectives 
1 140 6 п 124 2 
2 149 8 12 90 8 
3 153 9; 13 107 6 
4 151 1 14 141 4 
5 149 9 15 150 4 
6 151 3 16 124 4 
7 110 0 17 166 13 
R 120 5 78 155 7 
9 153 6 19 142 5 
10 150 3 20 143 5 


Plot a suitable control chart for the proportion of defectives and examine 
whether the process may be regarded as under control, 
{CL=0'039, UCL=0'088, LCL=01 (В. Se , Sadar Patel, 1969) 
22. (a) Explain the importance of Statistical Quality Control in Industry. 


(5) A specified dimension of a machined component їѕ 0"05 inch Five 


appropriatelimits. Discuss what evidence the chart gives of unstable production 
and give your inference. 


Sample 1 2 3 4 S 6 7 8 
+10 -п -5 T7 —4 —1 +8 +11 
+17 —-1  —14 +4 +1 +3 +9 +4 
=i —10 +3 +5 +6 +12 +2 +8 
—l +10 +414 +4 +2 -5 +3 +13 
S E з вы НУ г Tu +H 

Sample 9 10 п 12 13 14 15 
+3 +8 = TA +1 -3 —5 
+6 —8 +7 +4 —4 +4 
—4 +2 =] -1 +7 +1 +2 
СЕИ 

CL-005,0 +1 0 +1 


CL=00015, UCL- 6 00H ЕССЕ] (Diploma in Management, Madras. 1967) 


23. Distinguish between control charts for “defectives and control charts 
for defects”, 


, ,Draw the control chart for “defectives” from the following data. What do 
you infer from the chart ? 


Sample No. ^ No, inspected No. of Sample No. No. inspected Мо. of 


defective defectives 

1 200 5 6 200 6 

2 150 6 7 250 8 

3 300 8 8 200 6 

4 150 4 9 200 6 

e 100 3 10 150 7 
[CL—5:9, UCL= 13°07, LCL=0) (Diploma in Management, Madras, 1965) 


24. Quality control is maintained in a f, i X 
and Standard deviation (9) charts. Ten Жеш) are ү етер шс 
Samples in all were chosen whose X X was 595-9 and Zo was 8:98, 
ATE for апа ccharts, You may use the following factors 

c limits ; 

n E В: В, 

PON e. 100 028 17 

CL-331 L —33:6089, LCL=32'61t hi, 
LCL=0'4989, UCL=0-858 1, LCL-—0:1397 (M. Com., Delni, 1970) 


QUESTIONS R-3:43 


25. Explain the following in connection with SQD : 
(i) Specification limits and control limits. 
(ii) Action limits and warning limits. 
(iii) Chance variations and variations dve.to assignable causes. 
(iv) Rational Sub-group. 
(у) Defects and defectives. (B. Com., Bombay, 1970) 
26. The following table shows a series of pinhole tests of paper intended 


to be impervious to oits. Specimen sheet 30x 50 in size were taken from produc- 
tion at intervals and coloured ink was applied to one side of the sheet. Each 
individual ink blot which appeared on the other side of the sheet within five 


minutes was counted as a defect, 


Sheet No. No. of Sheet No. No of 

Pinholes Pinholes 
1 9 8 
2 9 10 7 
3 5 11 6 
4 8 12 4 
5 5 13 7 
6 9 14 6 
T 9 15 14 
8 1 


Construct a control chart far the above data and comment оп the state of 
(B. Com., Bombay, 1970) 


control. 
27. The following table shows the results of inspection on five separate 
batches : 
Batch Number Number of 
Inspected defectives 
1 1,000 60 
2 800 60 
3 500 42 
4 200 30 
5 50 15 


Set up control limits. Do you think that the process was in a state of 


control ? 

28. The answering of call at a swichboard may Бе thought of as a process. 
Fach call is unit of product and the time thecaller waits to be answered is a 
measure of the quality of the service rendered. Five calls, chosen at random; are 
timed during each hour the board is open. Results for the last 10 hours show (in 
seconds) : 

le 1 2 3 4 5 6 7 8 9 10 
жор 20 34 45 39 26 29 13 34 37.23 
R 23 39 15 5 20 17 21 11 40 10 
Construct an X chart and R.chart and determine whether this process is in 
control. js (M. Com., Delhi, 1971) 
29. (a) What do you understand by Acceptance Sampling ? 

(b) In order to determine whether or not a process producing bronze 
castings is in control 20 subgroups of size 6 are taken. The quality characteristic 
of interest is the weight of the castings and it is found that Y is 3:126 gm. and R= 
0:009 gm. 

(i) Estimate the standard deviation of the weight of castings. 
_ (ii) Assuming that the process is in control, find upper and lower contro? 


limits for the sub group means, UEM 
(iii) Assuming that the process is in control, find upper and lower control 


limits for the sub-group ranges. 
(iv) Using (/) above, within what limits would you expect 99:73 per cent of 
all individual measurements to fall ? 
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You may use the following control chart constants : 


n аз En Ds D4 B, В 

4 2:059 0[729 0 2282 0 2'206 
5 2:326 AE 0 2115 0 2:089 
6 2:534 0483 0 2:004 0 1:970 


(M. Com., Delhi, 1972) 
30. (a) Write a brief note on the method of constructing control charts 
for X and R, giving the formulale for the central Ma and the upper and lower 
imits i . Sample means and sample ranges are given, 
control limits in both cases. p (M. Com S Mosrat, 1973) 
(5) Repeated random samples of size 100 were taken fifty times and the 
number of defective units processed was found as follows : 


Number of defective units in a Sample : 50 «amples 
1 2 2 


4 1 3 2 4 2 2 
0 1 3 2 4 3 2 1 1 2 
2 3 0 2 1 0 1 2 <j 2 
4 0 2 1 5 1 3 5 2 1 
0 2 5 1 3 0 1 3 2 1 


Construct a control chart for proportion defective, drawing the central line and the 
upper and lower control limits on graph paper. 


{CL=0'02, UCL- 0062, LCL=0] (M) Com., Meerut, 1972) 


‘Sample No, E 2, 3, 4, 5, 
No. of defectives 2. te 37 3, 2 M 4, 2, 2, 0 


ICL—2, UCL—6:2, LCL-0] (M. Com., Meerut, 1971) 
32, Adrilling machine bores holes with a mean diameter of 0:5230 cm. 


land standard deviation of 0 0032 cm. calculate the 2-sigma and 3-sigma upper and 
ower control limits for means of samples of 4and prepare a control chart on 


graph paper. (M. Com., Meerut, 1971) 
331 The following are the number of defects noted in the final inspection 

of 30 bales of woollen cloth +0, 3, 1, 4, 2, 2, 1, 3, 5, 0, 25070,3; 2, 4, 3, 

0,0. 0, 1, 2, 4, 5, 0,9, 4 1, 0, and 3. 

Construct a control chart for C, the number of defects, plot the given data, and 

comment on the state of control, [CL —2:0667, UCL —6:3795, LCL —0] 
34. A manufacturer of transistors foi 


AM und the following number of defec- 
tives in 25 subgroups of 50 transistors : 


6 Te 585-3 10 


3, Si 4, 2: 2 zb Ts 0, 2, 4, 2, 3, 4 
L 27 4, 8, 2. 4, 2; 6, 4, 3, 1, 4 
‘Construct a Control chart for the fraction defective, plot the sample data on the 
chart, and comment on the state of control, 


| [CL —0:0656, UCL=0'1706, LCL—0] 
35. The following table gives the number of defect 


1 i S observed in 8 woollen 

стоев Fassing as satisfactory, Construct. the control chart for the number of 
ects : 

Serial No.of carpets 1 2 3 4 5 6 

No. of defects 2 5 5 6 1 5 1 ; 

[CL—4, UCL- 10, LCL-0] 


(M. Com., Meerut, 1973) 


table gives the number of. defective items found in 
Items each : 


2 6 2 4 4 18 0 4 10 18 
2 4 6 4 8 0 2 2 4 0 


Find the average fraction defective p, 


36. (a) The followin: 
20 successive samples of 100 


Зор and plot a complete ccntrol 


QUESTIONS R-3:45 


chart for fraction defective on graph paper. 

Was the process ever out of control ? (M. Com., Meerut , 1973) 

(b) A sample of 100 iron rods is said to be drawn from a large number 
of rcds whose lengths are normally distributed with mean 3 ft. and standard 
deviation 0:6 ft, Can the sample be regarded as a truly random sample ? 
[CL—0:05, UCL—0:1154, LCL=0] P (M. Com., Meerut , 1972) 

37. Draw control charts for mean X and range R from the following 
data relating to 20 samples, each of size 5. Only the central line and the upper 
and lower control limits may be drawn : 


Sample No. X R Sample No. X R 
1 380 15 11 325 30 
2 337 1 12 227 11 
3 243 22 13 2175 28 
4 365 24 14 28:7 21 
5 27:0 28 15 281 15 
6 30'5 33 16 243 18 
7 3ri 21 17 30:3 19 
8 274 20 18 253 33 
9 2470 29 19 377 17 
10 29:2 18 20 313 17 


(For a sample of size 5, d,—2:326, d;=0'864) 
(M. Com., Meerut , 1973) 

38. (a) It is said that quality must be built into a product and that a 
controlchart cannct cause а product to have high quality. How, then, does 
it make a contribution towards better quality ? 

(b) Thirty samples of 5 items each were taken from the output of a 
machine and а critical dimension measured. The mean of the 30 samples 
"E- was 0°6550 inch and the average range R of the 30 samples was 0 0036 inch. 
Compute the upper and lower control limits for X and R-charts. 

You may use the following factors for finding the 3 sigma limits : 


Sample Size Ay D; D, 
5 1596 0 21115 
6 1410 0 


27004 
(M. Com., Delhi, 1975) 

39. (a) Write a brief note on the method of constructing control charts 
for sample mean X and sample range R,giving the formulae for the upper and 
lower control limits in both cases. Sample means and ranges are given. 

(b) A manufacturer finds that on an average 1 in 50 of the items 
produced by him is defective. Draw a control chart for percentage defective for 
samples of size 100. 

(c) The following table gives the number of defects observed in 5 
carpets which were regarded as standard. Construct the control chart for the 
number of defects. 


Serial No. of cárpets 1 2 3 4 5 
No. of defects ТЫ 5 4 3 2 6 
(в) CL=2, UCL—62, LCL= 
[© CL-& ucL=10, LcL-o (M. Com. Meerut , 1975) 
SECTION 8 


BUSINESS FORECASTING 


1. What is meant by business foiecasting ? Give a critical estimate of 
the methods used in business forecasting. (M. Com., Gorakhpur, 1967) 


2. Distinguish between the ‘historical analysis of past condition’ 
“cross-section analysis of current events? as methods of pice chil Ts 
(M. Com., Gorakhpur, 1969) 
у 3. D the 'Time Lag Theory' of business forecasting. What are 
its assumptions ? (M. €om., Gorakhpur, 1966) 


R-3:46 STATISTICAL METHODS 


4. Explain ‘long-term forecasting’ and ‘short-term forecasting’. Examine 
the place of time serjes analysis in forecasting. (M.B.A., Delhi, 1965) 


5. What are the statistical techniques managers usually apply for business 
forecast? Explain clearly in what situations a particular technique is preferable 
to others. (Business Management, Madras, 1967) 


6. Explain the role of statistical methods in business torecasting, 
p (M.B.A., Delhi, 1969) 
7. What is business forecasting? What аге the assumptions on which 
business forecasts are made ? 


Describe the techniques of forecasting that are commonly employed by big 
business houses. (M. Com., Delhi, 1972) 


8. D ibe the statistical methods used in business forecasting. 
paren trea (M. Com., Delhi, 1975) 


SECTION 9 
PARTIAL AND MULTIPLE CORRELATION 


1, Whatis partial correlation? Under what circumstances is it to be 


preferred to the total correlation ? (M. Com., Delhi, 1970) 
ү 2. А Distinguish clearly between partial and multiple correlation and 
point out their usefulness in statistical analysis. (M.A. Econ., Patiala, 1974) 


3. Distinguish be:ween partial and multiple correlation. 
nS Б ү, (М. Com., Delhi, 1971) 
Я ive r15— 0 80, ris=—0'40, r;,——0:56, find the val of 
Fis» and Руз у (ri a0 759, тз 2=0°097, Been 0436) 
5. 712086, ғ13=0'65 and r,5—0-72, show that rg 3:=0:743. 
6. Isit possible to get the following from a set of experimental data : 
(а) rea=0'8, ra1=0'5, гүз=0°6 
(b) тәз==0'7, з= —04, rj9—0'6. (B.Sc., Agra, 1971) 
А l(a) Yes (b) No] 


7. On the basis of observations made on 35 cotton plants the total corre- 


lation of yield of cotton X}, number of balls, йе. i 

LESE НЫ 1 alls, £.e., seed vessels (Хз) and height (X3) 
- i 71370863, r,3=0°648 and гуз =0°709 
etermine the multiple correlation an i i 
ое fone ШР! correlation and the partial correlation r123 and 
[ыо ^ 
EN 712:3—07751, r,3.4—0:101, 

у 8. In a trivariate distribution, it is found that 5,—3. á ES BUM 
748—040, r51=0°61, 713—0 70. Prove that the partial correlations are: ” seti 
723-177 — 0048, r31 5— 049, ri 5—0 63 А 


9. Find the multiple correlation coefficient Rios; va A. Bombay, 1968) 


713709, л 23=0°4193, ғ13=0°75, ғ3=0:7 
Hint : First calculate r. 
10. The following data are for th ( ои r 
i 4 5 ing data are fort i | + 
intelligence scores X, and hours of ude s honour points, Y, general 
2x,* 250, ®хүхә=33, Ex,2—36 
Exjy—106, 3xy—22 ; у, x, 
and x being measured from their means. 


,, Find the equation of the regression plane of 
usefulness. Also wri LJ оп x; and xy and state its 
Mum ach write down the two partial regression сз апа ире 


[Y—0 3906X, --02531X, 


QUESTIONS R-347 


and 11. The table shows the corresponding values of three variables Xi, Xa 


X 3 5 6 8 12 
X, 16 10 zi 4 3 
X, 90 72 54 42 30 12 
3 Find (i) the least square regression equation of X; X, | 
{ estimate Y, when X,=10 and Ху—=6. EM Eg, i and de аал) 
Н i 
| 2 (i) Х,=61°4—3°65 Х,+2'54Х, 
(4% оле) 


i 12. The following table shows the weights X; to the nearest pound, 
heights X; to nearest inch and ages Хз to the nearest year of 12 boys : 

(a) Find the least square regression of X, on X, and Xs 

(b) Determine the estimated values of X, from the given values of Xa 


and Xs. 
(c) Estimate the weight of a boy who is 9 years old and 54 inches tall. 


Weight Height Age Weight Height Age 

(XD (Хз) (Xs) Qa) (Xs) (%) 
64 57 8 71 55 10 
71 59 10 57 48 9 
53 49 6 56 52 10 
67 62 11 51 42 6 
55 51 8 76 61 12 
58 50 7 78 57 9 


(M. Com., Bombay, 1969) 


(a) X1—3:65--0:855 Xo+1°506 Xa 
(b) 64°414 
(c) 63°4 Ib. 
13. An instructor of mathematics wishes to determine the relationship of 
grades on a final examination to grades on two quizzes given during the semester. 
Calling X;, Xs and X; the grades of a student on the first quiz, second quiz and 
final examination respectively, he made the following computations for a total of 


| 120 students : » у, z 
11=68 X70 X74 

" 5у=10 $5080 5,=09 

| 733—060 733—070 To3—0765 


(a) Find the least square regression equation of Хз on X; and Xs. 
(6) Estimate the final grades of two students who scored respectively 9 


and 7, 4 and 8, on the two quizzes. 
(a) X,=1'61+0°436 X14-0:404 Xa 


| ( (b) 8362 and 6:586. ) 


14. (a) Explain multiple and partial correlations. 


| (b) ша trivariate distribution 
0122, оз=о;=3 
3 113—071, гз=ги=0'5 
: Find (i) rasa» (Ï) Ri (zs), ee 
3 Ki) 02425, (di) 07211, (iit) 04 and 0:133, (iv) 1386] 
variate distribution it is found that 
ву=27 вз=24 вз=27 
113—028 гз=0'49 т,1=0`51 
ї i uation of хз on x, and x», the variates bei 
- EM 5. Aia find rs1-s à : (B.A., Bombay, 1974) 
[X5—0404 X, -04283 Xs 5 731.3—0446] 


(iii) Баз, bis» Gv) б1-зз. 


* 15. Ina tri 


APPENDICES 


APPENDIX ONE 
Logarithms, Reciprocals and 
Square Root 


LOGARITHMS 

Logarithms are of great significance in statistical work. They 
simplify calculations and enable us to obtain results of such 
quantities as аге otherwise difficult to solve. In statistical work 
they are particularly useful while’ constructing ratio graphs, in 
computing geometric mean, in finding powers and extracting roots 
and fitting trend lines. 

The logarithm of a given number to a certain base is the 
power to which the base must be raised in order to obtain that 
number. Thus the logarithm (or simply log) of 4096 to base 8 is 4. 
It means that if 8 (the base) is raised to the power 4 we shall get 
the required number 8 X 8 x 8X 8— 4096. ` 

The system of logarithm commonly used is on the base 10. 
Hence when we talk of logarithm of a number we mean thereby the 
power to which ro must be raised in order to obtain that number. 
Thus the logarithm of то is І because (10)!— 10 ; logarithm of 100 
is 2 because (10)?=100 ; logarithm of rooo is 3 because (то)? — 1000. 
It is easy to find out logarithm of such numbers as 10, 100, 1000. 
But when we want to find out log of say 95 or 168'4 we have to 
consult log tables which are given at the end of the text. 

The logarithm of a number consists of two parts —(i) the 
characteristic апа (ii) the mantissa. The number which refers to 
the integral power of ro is called the ‘characteristic’ of the logari« 
thm and the fraction value is known as the ‘mantissa’ of the 
logarithm. 

Finding the Characteristic 

The following rules are applicable for determining the charac- 
teristic of a number. | 

(i) The characteristic of any number greater than unity ie 
positive and is less by one than the number of figures to the left 
of the decimal point. i 

(ii) The characteristic of a number less than unity is negative 
and is greater by one than the number of zeros which follow the 
decimal point. The characteristic for some of the numbers is 
given below : 


Number Characteristic 
8 о 
82 1 
827 2 
8275 3 
82758 4 
“3 —10r 
708 —20г5 
"008 —зог$ 
*0008 —4o0r á 
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(ii) LOGARITHMS, RECIPROC+LS AND SQUARE ROOT 


Conventionally the minus sign of the characteristic is written 
at the top of the figure and is read as т bar, 2 bar, etc. It shall be 
clear from above that no table need be consulted for dermining 
the characteristic. 


Finding of Mantissa ` 

The fraction value is known as mantissa of logarithm. The 
mantissa value is determined by consulting specially prepared log 
tables. Two things must be remembered about mantissa :— 

(i) Mantissa is always positive. 

(ii) Mantissa is not affected by the position of decimal point, 
Thus mantissa of 625, 62'5, 6°25, “625, *0625 will be the same. 


In order to find the mantissa of a given number we should 
reduce it to 4 digits by approximation, if necessary, because beyond 
this we cannot consult log rable. This first two digits are seen in 
the left-hand vertical column and for the third digit the corres- 
ponding horizontal column is read. To the figure thus obtained, 
the quantity appearing under the ‘mean difference’ column is 
added to adjust for the fourth digit. Thus if we want.to obtain 
logarithms of 12678 we shall approximate it to four digits, i.e., 
1268. The characteristic of this number is 3. For mantissa 
consult log table. Now 12 in column 6 is 1004. Tothis add the 
8th mean difference, i.e., 27. So the required logarithms is 3'1031. 
The follo wing are the logs of a few numbers, 


Number Logarithm 
5732 37584 
573 272582 
573 17582 
5773 0:7582 
"573 T7582 
7057 2:7559 
*oc8 v 39031 


Finding the Antilogarithm 
If for a certain logarithm we wish to find out its natural 
number the table of antilogarithm shall have to be consulted. For 
finding out antilogarithm we consult the table only for the man- 
tissa part, i.e., the digits after the decimal point. The procedure 
for consulting antilog tables is the same as for log tables. The place 
of decimal is determined by the characteristic part of the given 
number, i.e., the number of digits given before the decimal point. 
We add one to the character'stic for finding out the natural number. 
Illustration т. 
Find the antilog of 2:9846. 
Solution: From Antilog tables, entry for 984=0638 
Mean difference for 6 = 3 
Entry for 9846 =9651 
The decimal shall be placed after 3 digits, i.e., one more than the number of 
characteristic. Thus the required number is 9651. The log of this number 
is 279846. 
г *Significant figure means first natural number after zero for example if 
the figure is ‘003, then the significant figure would be 3 
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The antiloge of certain numbers аге: 


Number ~ Log Antilog 
5732 3°7584 57330 
573 27582 $731 
573 1'7582 573 

573 0'7582 573 
573 17582 573 
057 2-7559 *o57 
*008 39031 “008 


Use of Logarithms 

Logarithms are extremely useful in statistical work as shall be 
clear from the following examples : 

(i) Multiplication. When we want to find the product of 
two or more numbers, take the sum of their logarithms and find 
out antilogarithms. It should be noted that when logs are added 
mantissa is always positive ; characteristic may be either Positive 
or negative. Symbolically, 

` Ax B—Antilog [log A+log В]. 
Illustration 2. 
Multiply 532 with 116. 
Solution : 532X 116—Antilog [log 532+log 116] 
—Antilog [277259270645] 
= Antilog [4°7904]=61720, 
Qlustration 3. 
Solve 84°05 X0'1357X 1'163 


Solution : 
84'05 X 0°1357 X 1'163=AL [log 84*65--log 0'1357--log 1°163) 
=[1'9246+] 1325-0656] 
=AL [1°1227]=13°26. 
Illustration 4. 


Multiply 5872 by '058, 
Solution : 5872 Х *o58—Antilog [log 5872--log *058] 
—Antilog [37688-27634] 
: =Antilog [2:5322]—340'6. 
(ii) Division. When we want to divide one number by 
another, subtract the logarithm of the latter from the logarithm of 
the former and take the antilog of the difference. Symbolically, 


A+B=Antilog [log A—log B] 


Illustration 5. 
. Divide 792 by 97. 


Solution : d —Antilog [log 792—log 97] 


= Antilog [2:8987— 19868] 
=Antilog [09119] —8'164. 
Illustration 6. 
Divide ‘0962 by ‘25. 
2 "0962 5 б 
Solution : Pu =Antilog [log *0962—1og *23] 
3 


—Antilog [2'0832—1`3979] 
—1:5853— 3849. 
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(iii) Raising a number to a certain power. When we want 
to raise a number to a certain power, multiply the logarithm of the 
number by the exponent of the power and find out antilog of the 
product. Symbolically, 

*—Antilog [x log A] 
lllustration 7. 
Solve (6*86)5, 
Solution : (6'86)5— Antilog [s log 6°86) 
=Antilog [5 х *8363] 
=Antilog [4°1815]=15190. 

(iv) Extracting root of a given number. For extracting 
root of a given number divide, the logarithm of the number by the 
index of the'root and take the antilog of the result, Symbolically, 
V/A —Antilog gA 

=Antil ў —113'0. 
sve ael. ntilog [270531]—113'0. 
Simplify Мт: 27654 
Solution : Let x7 V 127654 
log x=} log 12765'4 
log х=} x 41062 
Og x—2'0531 
x—AL2'0531— 113 


Illustration o. 
Simplify V‘oo892 
Solution : / оёз = Antilog{ 190 овоз) 


АЦ 5-resser ear [x +'5901]=*3891 
" 5 y 
Illustration то, 
uo 09 шш 
V 644'9/ s92*5 
Solution: Let x= 5529. - 
V 64479 V/ 5925 
log x—log 52975 —1 [log 644*9--log 50275] 
=2'7239— 4 [2:8095 4-277727] 
—2'7239—1 [515822]. 
—27239—2'7911— 19328 
x—AL 1:9328— 4-"8567 


Solve 


Tllustration тт, 
Solve 258-F 3464 
3°09 


Y 


LOGARITHMS, RECI?ROCALS AND SQUARE ROOT (у) 


2'58-+3'464 
3°09 
log x—log 2°58+log 3°464—log 3°09 
—'4116--*5396— 4900—'4612 ; x—AL'4612—2:892 


Solution: Let x= 


Illustration 12. 
Solve Зо x 210(77825)*2175)* 
Solution. Let x—80x210( 7825) 2175)* 
log x=log 80-106 2104-6 log *7825--4 log ‘2175 
—1'9031--2:3222--6 X 1'8935--4 X 173375 
=1'9031-+2°3222+ 1°3610+ 33500 
=4'2253+ 4°7110="9363 3 *=AL 9363-8636 
Illustration 13. 
18 52 
Solve I+r = EET 
Solution: Taking logs of both sides 
18 log (14-7)—log 52—log 30 
18 log (1--r)—177160— 14771 
18 log (14-т)='2389 
log (14-7)—'01327 
1+r=AL '01327=1'031; т='оз31 
RECIPROCALS 
Quite often we have to find out reciprocals in statistical work. 
The reciprocal of a given number is defined as one divided by that 
number. Thus the reciprocal of 5 is 1—'2, reciprocal of 10 is yo="I. 
To find out reciprocals, we can consult reciprocal tables. Just 
like logarithms, while consulting reciprocal tables also the given 
figure is to be approximated to four digits. The first two digits are to 
be seen in the left-hand vertical column, whereas third digit in the 
top horizontal column and the fourth digit in the column of mean 
differences. It should be carefully noted that the figure appearing 
in the mean differences column is not to be added but subtracted. It 
must also be remembered that if the decimal point moves by one 
digit to the right in the given number it moves by one digit to the 
е in the reciprocal. The reciprocals of certain numbers are given 
below : 


Number Reciprocal Number Reciprocal 
8 "1250 0'584 I'1712 

20 "0500 0'046 21°7400 
225 °0044 0'007 142'9000 


EXTRACTING SQUARE ROOT 
Quite often we have to take square roots while making calcus 
lations. There are three ways in which the square root of a number 


can be taken : 
(i) By consulting table. Square root of a number can be deters 


mined by referring to the tables of square roots. 

(ii) By using logrithms. When logarithms are used for taking 
square root, the square root of a given number will be antilog of 
logarithm of that number divided by 2. 

i . 6 “ 
УЕН ТЕ. fae pr (256)? =Antilog [t log 256] 
=Antilog [13X2'4082] =Antilog [1°2041]=16. 


(vi). } LCGARITHMS, RECIPROCALS AND SQUARE AOOT 


(iii) Without Tables. When this method is used, no table etc. 
is consulted. Rather pairs of two are prepared. While preparing 
pairs, for the fraction we start towards the right of the decimal point 
and for the integral part towards the left of the decimal point. The 
greatest square for the first pair or, if not pair, for the single digit 
is determined. The following example will illustrate the procedure. 
Illustration 15. 


Take the square root of 369°732. 


Solution ; 19:228 — 
1 | 36977320 
M 
D 29 | 269 
26r 
382 873 
764 
3842 10920 
2684 
38448 323600 
307584 


Thus 4/3697732— 19228. 
RULES OF SIGNS 
Addition. While adding a series containing both plus and 
minus quantities, add separately the plus values and the minus 
values and take the difference between two sub-totals and give the 
difference, the sign of the numerically larger sub-total. 
For example, 8-F124-15— 12—164-13—30 
:48—58——10 
Subtraction. While subtracting numbers, change the sign of 
the number to be subtracted and proceed as above 
For Example, 10—(—5)—1045 = 1; 
10—(t5)—10—5 = 5 
5—(+10)=5—10 = —5 
Multiplication. When two positive quantities are being 
multiplied the result shall always be positive. When two negative 
quantities are multiplied the result shall also be positive. But 
when one positive quantity is multiplied by a negative quantity, 
the result shall be negative. 
For Example, (+4)x(+3)x(+5)= бо 
(—06)х(—5)х(-+-2)= бо 
CF6)x (4-5) x (—2)=—60 
Division. While dividing one quantity by another it should 
be remembered that like signs, whether Positive or negative, give a 
Positive result whereas unlike signs, i.e., one Positive and one 
negative, give negative result. 


For Example, (+50)+(+25)=+2 
(—50)=(—25, 
(~50)= (4-25 
(+50)+(—25)=—2 


. Note. It should be noted that while the procedure of taking 
reciprocals, logarithms and antilogarithms with the help of tables 
simplifies the calculations, it introduces an error of approximation 
as the logarithmic tables and reciprocal tables at the end of this 
text are correct only upto four decimal places. 


APPENDIX TWO К 


Permutations and Combinations 


The knowledge of permutations and combinations is extremely 
helpful in solving a wide variety of problems relating to probabi- 
lity, theoretical distributions, etc. 

Permutations 

A permutation of n different objects taken r at a time is an 
arrangement of т out of the n objects with attention given to the 
order of arrangement. The number of permutations of n objects 
taken т at a time is denoted by Pr, Pn, ог P(n, r) and is given Ьу: 

n 
nP =n (n— 1) (п—2)...... (n—r-F1) = Gyr! 
In particular the number of permutations of n objects taken n at 
a time is : 
npr=n(n— 1) (n—2)..-..- 11 | 
Factorial n, denoted by n! is defined as 
п! =n(n—1) (n—2)......1 

Thus 4! —4X3X2X1-—24. It may be noted that o !=1 

Illustration т. Find the number of permutations of the 
letters a, b, c, d taken two at a time. 

Solution. Here number of letters, i.e., n—4, т=2 

a "pr=*py=4X 3=12 

This can easily be seen from the following permutations 
formed of the letters a, b, c, d, taken 2 at a time 

ab, ac, ad, be, bd, cd 
ba, ca, da, cb, db, dc ў 

illustration 2. Three travellers arrive at a town where there 
are five hotels. In how many ways can they select their room, each 
at a different hotel ? 

Solution. The first traveller has a choice of 5 hotels, the 
second has a choice of four hotels and the third a choice of only 
three hotels. Hence the required number of ways is "ps=*ps 
=5X4X3=60. й 

Illustration 3. In how many ways can 6 differently coloured 
marbles be arranged in a row? 

Solution. We must arrange the 6 marbles in 6 positions 


thus :— 

There are 6 ways of filling the first position, 5 ways of filling 
the second position and so on. Therefore, number of arrangements 
of 6 marbles in a row 

—6X5X4X3X2X1-6 1=720 


(viii) PERMUTATIONS AND COMBINATIONS 


Illustration 4. In how many ways can 8 people be seated 
on a bench if only 3 seats are available ? 


Solution. The first seat can be filled in any one of 8 ways, 
the second in 7 ways, the third in 6 ways and the fourth in 5 Ways. 
Therefore number of arrangements of 8 people taken 3 at a time 
is 5p,—8x 7X 6=336 

It should be noted that the number of permutations of n 
objects consisting of groups of which n; are alike, n; are alike...is : 

! 
—— where n—n; !d-n; ! 4-....... Hn 
n ! na! 

Illustration s. Find the number of permutations of letterg 

in the word statistics, 


Solution. The total number of letters in the word statistics 


is ro of which there are 35's, 3t’s, та, 2i’s and тс, Hence the 
required number is given by 


10 
EUR TENEH. 
Combinations 


750,400 


order of arrangement. The number of combinations of n objects 
taken т at a time is denoted by "Cr, C(n, т), Cs, r or (п) and is 


п (n—1) (n—2)......... (n—r+ !) з п] 
"Cr r! rl (n—r) 1 


Illustration 6. Find the number of combinations of the 
letters a, b, c, d taken 2 ata time. 7 


Solution. The number of combinations of the letters a, b, 


€, d taken 2 а{ а time is Agam 4X3 76. These are: ab, ac, ad, 


be, bd, са. It may be noted that ab is the 
ba but not the same Permutation. 


Illustration 7. In how man 
persons be chosen out of то? 


Same combination as 


У ways can a committee of 4 


Solution. The required number of ways is given by c= 
10Х9х8х7 , 
lta UIS o BR 

4X3X2Xr 7210 


_ _ Illustration 8. In how many ways can a random sample of 
5 cities be drawn from a total of 20 ? 


Solution. The required number of ways is given by 


mo 20XI9X18X17X16 _ 
= SX4XaXaxi 15504 
It should be noted that 


PERMUTATIONS AND COMBINATIONS (ix) 


*acr=^cn-e. Thus Wee m AI A sg Similarly 396, 
x T : Е y 
=, = TS =1140. This relationship greatly helps in 


simplyfying calculations. 

Illustrations 9. In how many ways can 4 гей balls be drawn 
from a bag containing то red balls ? 

Solution. The required number of мауѕв=10с,= КЫ. 


=210. 

Relationship between permutations and combinations. 

From the above description it is very clear that "cr= 
n Bá n! n! 
tef . "pre (n—D)T and E Seer Iri 

Or "pr="cr Xr! 

This can be illustrated with the help of an example : 

Illustration xo. How many permutations and combinations 
can be obtained from 6 objects taken 3 at a time? 

Solution. Permutations are given by "pr=*p;=6 5X 4—120 
6X5X4 
З 
Thus the number of permutations is 3! i.e. 6 times greater than 
the number of combinations 

4 пре =" хт 1—20X3 !—120 


Combinations are given by "cr = ĉc = 20 


*Proof : r= 

n! o ABC: 
acn-r= (n=r)! (n-(n—0)! (n—r)!r 1 
Hence 20т=80д-т 
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APPENDIX FOUR Я 
‘Symbols, Abbreviations 


and Formulae 
SYMBOLS AND ABBREVIATIONS 


A — Assumed Mean 
bı от b, —Regression Coefficient of X on Y 
b, от b, —Regression Coefficient of Y on X 
C=Common factor 
c.f. = Cumulative frequency 
Coeff. — Coefficient 
d=(X—A), i.e., deviation of a value X from an assumed mean 
gan, i.e., deviation of a value X from an assumed mean 
taking a common factor 
| d | =deviation of items from median (or mean) ignoring sign 
d.f.— degrees of freedom 
Dy, Dg, Ds, etc.— 1st, 2nd, 3rd, deciles etc.. 
E=Expected value 
f frequency 
fo= Observed frequency 
fa — Expected frequency 
fv fas fas etc.— frequency of first, second, third classes, etc. 
G.M. —Geometric Mean 
H.M.=Harmonic Mean 
i— Class Interval of a class 
Log=Logarithm 
1= Lower limit of a class 
U=Upper limit of a class 
m- mid-point of a class 
Med.— Median - 
Mo=Mode 
M.D.=Mean Deyiation 
N=Number of observations or sum of frequency, i.e., Zf in 
case of a frequency distribution 
O= Observed value 
p=probability of happening of an event 
Ру, Pa, Ps, etc.=first, second, third, percentiles etc. 
po— price in the base year 
pi price in the current year 
Qis Qs, Оз, etc.—first, second, third, quartiles etc. 
qo—quantity in the base year 
q,-quantity in the current year | 
Q=Yule’s Coefficient of Association 
Q.D.—Quartile Deviation 
r— Coefficient of correlation 
ть = Rank Correlation ae $ 
т, = Correlation by concurrent deviation method 
r!— Coefficient of determination 
SK — Coefficient of Skewness 
C.V — Coefficient of Variation 


(xiv) SYMBOLS, ABEREVIATIONS AND FORMULAE 


W Weights 

Xy Ху, Хз, etc.— Individual observations of the variable X 
X=Arithmetic Mean 

x —(X—X), i.e., deviations of items from actual mean 
„2 


келү ie, deviations of items from mean taking а 


common factor, i.e., step deviations 
Ху. = Combined mean of two series or groups 
X, Combined mean of three series or groups 
X,— Weighted arithmetic mean 
Yı Ys Үз, etc.—individual observations of the variable Y 
y- arithmetic mean of Y series 
у=(Ү—Ү) 
zLX-X 


с 
Z —Summation ‘or the sum of” 
Greek Letters 
The following Greek letters are very popularly used in 
statistical work : © 
о (pronounced as sigma) —standard deviation 
c?— Variance ог шу A 
Bı (pronounced as Beta one)=It is a measure of skewness 
В, (pronounced as Beta two)=It is a measure of kurtosis 
x? (pronounced as chi-square)=It is a test of goodness of fit 
Ua» Из, etc. (pronounced as mu one, mu two, etc.) denote first, 
second moment, etc. about mean 
ш’, Ua’, etc. denote first, second, moment, etc. about an arbitrary 
origin 
Vp Уз, Ув, etc. (pronounced nu one, nu two, nu three, etc.) 
denote first, second, third, moment etc. about zero. 
(FORMULAE) 


(Measures of Central Value) 
Arithmetic Mean 
EX 


Individual Observations: X= TN 


X=Xj, Xy Ху, etc. ; N=Number of observations 
Discrete Series 


X= N (Direct method) 


(Standard normal variate) 


X=A+ D: (Assumed mean method) 
where d=(X—A) 


Continuous Series 


T X 
Х= in (Direct method) 


where — m—mid-point 


SYMBOLS, ABBREVIATIONS AND FORMULAE (xv) 
Ха’ 


Х=А+ ү ХС (Step Deviation method) 
die Vue and C common factor 
Combined Mean 
x,-MAXCENX 
14 N,+N, L] 
E МХХ, №Х, 
mU NeENEN, 


Median, Quartiles, etc. 
` Individual Observations and Discrete Series 


Median=Size of Nt 
N+1 
4. 


Tos 
th item 


Q,=Size of th {tem 


Continuous Seriea 
Medians NB Sf 


1 
Q=Lt+ “се! хї 
L= Lower limit of the quartile class | 
c.f. = Cumulative frequency of the class preceeds 
ing the quartile class 
f=Simple frequency of the quartile class 
i=Class interval of the quartile class 
№ Ху, i.e., total frequency in a discrete and continuous series 


Mode 
Continuous Series 
Ai 
Mode= TET 
where L=Lower Limit of the modal class 
Ai=difference between the frequency of the modal 
class and the pre-modal class 
Л: difference between the frequency of the modal 
class and the post-modal class 
i=class interval of the modal class. 


Or 


xi 


L=Lower limit of the modal class 
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fi frequency of the modal class 
h= 5 зэ ээ Class preceding the modal clase 


f= ‘3 » эз Class following the modal clars 
i=class interval of the modal clase 


Mode=3 Median—2 Mean 
Geometric Mean 
Individual observations 


G.M.=Antilog Shoe x 
Discrete Series 

G.M.=Antilog ех 
Continuous Series 

G.M.—Antilog RE 


Harmonic Mean 
Individual observations 


М.=— dicte 
(^1) 
(Measures of Variation) 
1, Range 
Range=L—S 
where L=Largest Value, and 
S=Smallest Value 
ч L— 
Coefficient of Range— Le 
2. Quartile Deviation 
E Q. Dee О, 


Coefficient °“'ор.-@—@ 
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- Mean Deviation 
Individual Observations 


-ZIDI 
М.р.= N 
Discrete and Continuous Series 
"En 


| DI os from median ignoring signs 
when mean or median are in fraction the calculations can be 
si nplified by applying the following formula : 
M.D. _ Emfa— Zmfs—(Zfa—Zfe)Med 
(from median) — М 
Апа [y 
M.D. _ Zmfa— Emfa — (E fs—Efao)X 
(from mean) N 
Mean Deviation: 


Coeff. of M.D.— Median 
Mean Deviation (in case the deviations are taken 
TT Mean from mean) 


4. Standard Deviation 
Individual Observations 


xg where deviations are taken 
Pid TN from actual mean, f.e., x=(X—X 
zd _ 2g (20) where deviations are taken 
"= (вон assumed Weed ie, ) 
=(X—A) 


Discrete and Continuous Series 


a es UR р A deviations are eodd 


from an assumed mean 


zd ifa _ Eu where step deviations 
Fe af E- E C bs taken 


К) and 2-00) 


and C=Common factor 


Coefficient of Standard Deviation==— 


o. | с 
Coefficient of Variation or C. Vx X100 


Variance UC X^ or Магіапсе=о° 
m в= Variance. 
Variance=| 20 (= eo )k e (Continuous Serics) 


765M App.—9-77 
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Combined Standard Deviation 
da \/ М, Мо F Nidi F Ned," 
NN. 
di -(X,— Ху), d-(X,— Xu) 


Standard Deviation of n natural numbers 


°=А/ EN- 


(Skewness, Moments and Kurtosis) 
Absolute Skewness- Mean— Mode ог Q4,4-Q,— a Med. 
Relative Skewness 
(1) зке асыш 


(Karl Pearson's Method) 


(а) Sk = 3(Mesn—Median) (usen mode is ill-defined) 


(3) SK = Wt Gas Median (Bowley's Method) 


(4) Skewes based on меа» 
B- ы. » Or x =a 


Moments 2. 
2(X~X) ор 20-Х, 


Ep SNe ee онаст ү? ie, 4703, or omy iy 


X~A ~A)? 
BEM sume AOD a 


Й 


Conversion of moments about an arbitrary orisi» into 
moments about mean 
Bata — (ai 
Vas — 34a, На 2057 
f Va 70a — 4hr Ba 6001)? pa’ — gp"! 
Kurtosis 


Kurtosis is measured by the coefficient y, 
ya=fa—3 
h aes 
where pes 
(Correlation) 
(1) Karl Pearson's Method. 
(а) When deviations are taken from actual means 


where т is the coefficient of correla- 


ES 
Мх х Хуг 
к=(Х—Х), y=(Y~Y) 


tion and 


SYMBOLS, ABBREVIATIONS AND FORMULAE (xix) 


(b) When deviations are taken from assumed means 
гаа, Cix Ea 


Xu ET 
"| I za Cap 
ds —(X— A) and dy=(Y—A) 


(c) Coefficient of correlation in a correlation table 
хуа, — 2142) x (Zfdy) 


ра Е 
2]455— сн | zfay (274)? 
п 
Interpretation of the Pearsonian's coefficient 


S.E-= 


= 


where 


1-7 

VN 3 

P.E,—o6745 —= 
©6745 сут 


(2) Rank Correlation. 
ZD? 


(a) њ=1— NC N 
D=(R,—R,), i.e., difference of ranks. 
(b) When ranks are repeated 
ae SEA ти пы) en! m) on) 


(3) Concurrent Deviation Method. 


CS?) 


c=number of positive signs after multiplying da 
with dy. 
(Simple Regression) 
Regression line of Y on X is given by Y—a4-bX 
Regression line of X on Y is given by X—a--bY 
Regression equation of Y on 
Y— Y-: ^ (Х—Х) 
ga 
Regression equation of X on Y 
X—X-:-*- (Y-Y) 
oy 
Regression coefficient of X on Y i.e. bay or by 
о» — Xxy (when deviations are taken from 


EU Хуг actual mean) 


Regression coefficient of Y on X i.e. bys or by 
=p" _ Хху when deviations аге taken from 
ү c. Ху? actual mean) 
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When deviations are taken from assumed means 
Zdz)x (Fd, 
Sid teers x y 


biy— E 
Уау 
zap- 40 


z 
fauc с 


у= Wu РҮ ЗР 
Po da E. 
т= V bxy X byx 
(Association of Attributes) 


Nine-square table " 
ШАГЫ А a Total 


' | 
B (AB) («B) | (B) | 


B|(AB) (aß) | (8) 


Total | (A) | (а) | (N) » | 


ва) 
= (0 (1 
(В)— =(AB) + («B) 


B) 
N-(AB) +(АВ) + («В) + (a£) 
Methods of Studying Association: (1) Comparison of observed 
and expected jrequencies method. 
Attributes A and B are called 
(i) independent if 
AB) _ (А)х(В) 
(actual observation) N 
(ii) positively associated if 
AB (A)x(B) 
Т ОСИ е? М 
(її) negatively associated if 
AB (A) x (B) 
KM es am $ N 
(2) Proportion Method. 
Attributes A and B are called 
AB) _ (АЗ) 
(B) (В) 
(i) positively associated if E S 


(i) independitit dp © 


SYMBOLS, ABBREVIATIONS AND FORMULAE 


(iii) negatively associated if оа 9 


(3) Yule's Coefficient of Association. 
о (Be 8) —(А B)(a B) 
(АВ)(а 8)+ (AB)(« B) 
(4) Coefficient of Collignation. 
1 [САВ)х(аВ) 
IN (AB) (a) 
з, [(A 8) x (aB) 
„М (AB) x (aĝ) 
(5) Coefficient of Contingency. 
PB nid paca cles 
N+% 
ОЕ 
=z EIU 


where 
O refers to the observed frequencies. 
E refers to the expected frequencies. 


(Index Numbers) 
(1) Unweighted Methods. 


(a) Simple Aggregative Method 
ep 
Pa= ZW X100 
pi Price of the current vear, 
po Price of the base year. 
(b) Simple Average of Price Relative Method 
2 ( CX 1oo) 
Bum B 
(с) Geometric Mean of Price Relative Method 
[22 X100 ] 
Py =Antilog = — - N 
(2) Weighted Methods. 


(a) Weighted Aggregative Methods. 
(1) Laspeyre's мон 
Pu = 140108 
ie 2 poto 
qo= quantity of the base period. 
(ii) Paasche’s Method 
Хр 
Pass 
ГУ 2р. z 
q:=quantity of the current period. 
tii) Dorbish and Bowley's Method 
Zpigo , Z5 
ELM DIT 


X100 


(xxi) 


(xxii) SYMBOLS, ABBREVIATIONS AND FORMULAE 
L+P 
or Paz zr 
(iv) Fisher's Ideal Method 
ранае PR eei, 


Pedo 2 Pod 
(v) Marshall Edgeworth’s Method 


P= 2(%+%)р 


X 100. 
2(%+-а)р 
(vi) Kelly's Method | M 
ui EH, 
Pa= Sno тоо. 
(b) Weighted Average of Price Relatives 
ZlogP x V 
Р, o> желу 


where Pam = X 100 and V=value weight, i-e., родо. 
Quantity or Volume Index Numbers 

Qu= XX тоо, when Laspeyre's Method is used. 
where Оһ —Quantity index. 


Qu= Pe X 100, when Paasche’s method is used. 


opi 
Eq Dy : B 1 
Qo si (Po... “MP1 x тоо, when Fisher’s method is 
х Харо Хх ®фор\ used. 
Time Reversal Test Po X Pip=1 


Factor eat Test 
Didi 
Pax Ona 
Circular Test Pra X Pa X P,í—1 
Consumer Price Index 
(a) Aggregate Expenditure Method 
Cost of Living Index 


Pogo 
(b) Family Budget Method 


Consumer Price Index= ZEV 


ZV 
where icis X100 and У==р,ф, 
0 
Analysis of Time Series 


Method of Least Squares of Measuring Trend. 
The equation of straight line trend is =а+ьХ 
To determine the values of the constants а and b the follow- 
'ng two normal equations are to be solved. 
ZY=Na+b2X 
2XY=aZX+bEX" 


SYMBOLS, ABBREVIATIONS AND FORMULAE (xxiii) 


When deviations are taken trom the middle period so that 
Zx—9 the value of a and b can directly be obtained as follows } 
хү Хху 


cec E zx 


N 
Second Degree Parabola 
The equation of the second degree parabola is Y=a+bX+cX? 
To determine the values of the constants a,b and c, the 
following three normal equations are to be solved simultaneously f 
ZY-Na-bZXJ4-cZX* 
ZXY-aZX4bZXrFcXX? 
ZX3Y -—aZX!-cbZX*-FcEX*, 
When deviations are taken from the middle period so that 
Уу =o, the values of a, b and c can be ascertained as follows : 
ZY=Na+c2X? : 
ZXY-bZX* 
ZXYY-aZXLcZX*. 
Probability 
1. Addition Theorem j 
(a) When events are mutually exclusive 
P(A or В)=Р(А) + P(B) 
(b) When events are not mutually exclusive 
P(A or B)=P(A)+P(B)—P(A and B) 
a. Multiplication Theorem 
P(A and B)=P(A) x P(B) ; 
Theoretical Distributions—Binomial рина е ^ 
(q+ p*—qh ^ Cig ip ^C ptt one Org pet rE 
mean of the т, distribution is nD and standard deviation Мпра 
ш, пра, us—-npa(q— p), їч =3п°р°@ + npa(t — бро) 


abe —6 
| 07, pos 


пра 
Poisson Distribution ү 
The Poisson distribution is given by the expansion 


1 з т 
e (oem PIRA USE ral RIS 
The mean of the Poisson distribution is m and standard 
deviation ./m. 
I EE 
рат, Ват, тзт, B, and һ=з+ т 
Normal Distribution ; 
Normal distribution is defined by the equation 
BENED 
=— é 
oV 28 
. Mean of the normal distribut 
ation c. #=0?, 540, It,39*, Ё.=0 an | P—3* 


Standard Normal Variate or 2-020) 


ion is X and standard devi- 


(xxiv) SYMBOLS, ABBREVIATIONS AND FORMI LAE 


Tests of Significance 
A. Sampling of Attributes 
S.E. of the number of successes 
=Vnpq 
S.E. of the proportion of successes 


=, [ра 
n 


S.E. of the difference between two proportions 


=n) 


отр пара, 
where p= IS 
B. Sampling of Variables (Large Samples) 
SE 2 esie] 
UN VN 
9596 fiducial limits of population mean are 
X +1'96 25 
VN 
с 
S.Eo= Van 
БКЫ J E 
2 
BE ed 
VN 
S.E bye = VV I= 
у N 
S.E ba - Vr 
osy N 


S.E. of the difference between the means of two samples 
(i) When samples are drawn from the same population 


БЕЯ) f «( коду 
1 2 с m + пу 
(ii) When samples are drawn from different populations 
а-а о 
ny Ng 
Standard Error of the difference between two stander? 
deviations — S.E.(o,— ZEN Cd oe 
2ny 


2n, 


SYMBOLS, ABBREVIATIONS AND FORMULAE (xxv) 


Sampling of variables (Small Samples) 
(i) To test the significance of the mean of a random sample 


ies 


S= ees Z(X—x» 
n—1 
(ii) To test the difference between the means of the two samples 


po is уа 


пу 


з-,/2®— [Z0 X) HIK ХӘ 


nytna™2 
ee ша Pee AL Ve. 
fes м SKA EX Ау п AN та Аа) 
fact rig 
(iii) Difference Test 
ЕТС 
S 
where s= [2087 or ES 
n—i ПІ 
(iv) For testing the significance of an observed correlation 
coefficient T PII 
== ааб E 
міт? Мп—2 


t is based on (п—2) degrees of freedom. 
(v) Z-test of the significance of the correlation 
Z=} loge c or r'1513 logio (EE) 


т 
t=} loge m or 11513 logio (=) 


ation coefficient, 


coefficient 


where т is the sample correl 


and p is the ioo correlation coefficient. 


$.Es= —— 
Js n—3 

(vi) Z-test of the significance of the 
independent correlation coefficients 


difference betwezn two 


T 


(xxvi) SYMBOLS, ABBREVIATIONS AND FORMULAE 


i 1+ р In 
Z=} loge yx, E PISIS logio Іт, 


s EE I MES 
ES ERES 
tvii) F-Test— 


Fa 55. where 


—wy) 
s: 2X x) 
т—1 
BEC 
sp- Z% 
na— I 


The numerator is always the greater variance. 
X? Test and Goodness of Fit 


00-Е) 
p our ma 


Alternative method of obtaining the value of X* 
yi (ad—bcy.N бї 
(a+c)(b+d)(c+d)(a+b) 
with Yate's correction 
y2=—_(ad—be.—4N).N _ 
(a+ c)(b4- d)(c 4- d(a--b) 


When degrees of freedoms are greater {һап 30, V/2X3— 4/2y—1 
is used as normal deviate. 


{Interpolation and Extrapolation) 
Binomial Expansion Method 


(у—1)"=у"—пуп-14. Mem уй e 

Newton's Method 

эж=»+хды+ УЭ да. s Dea „ oy 
where ys is the figure to be interpolated and 


des year of interpolation—year of origin 
time difference between the two adjoining years 


Lagrange's Method 
(х— ху)(х— x)(x —Хз). 


ут... 


exec 


SYMBOLS, ABBREVIATIONS AND FORMULAE (xxvii) 


where ys is the figure to be interpolated, x is the value in the 
series for which ys is to be interpolated, xo, xj, X;...xa, аге the 
given values of x-variable and yo, yj Уз...уп are the corresponding 
given values of y-variable. 
Statistical Quality Control 

Upper and lower control limit for mean is given by 

Xt AR 
Control limits for Range 
DR and D,R 
Upper and lower control limits for p chart. 


pts Je 


Control limits for c chart 
PELA 
Partial and Multiple Correlation 
Partial Correlation 


тиа= _Tig— lias 
Virg мт: 

туза = ns liste». 
Мт—т?„ V т—т%з 

а= Тэз "19713 __ 
Мт? 


2 
T'ss 


as V 1—78 
Tira Tira — "18472274 
Мт та 1—7 
там 13:4 — 712-47 a4 
; Ут таатал 
таза T14-3— 712-37 24-3 
Міт. aV 1— fg. 2453 
Multiple Correlation 
R,. fece or Res =y retrasa a) 


1—T1233 
24— 7147 а 
К, u Epes (Cpu e EREE 1 ae HM or Б, UEV Tn m 1478 ез (1—7 |) 
1—14 
i = — i 
NEN UTER з-ты arituta or Ria v that res й) 
I—r15, 


Multiple Regression equation. 
X, on X, and Ху. X,—2445 0125 Xj bis Xa 
кейш of ранае ац. am bie 3 ne bis’s are obtaine 
l the following three normal equations * 
ую ZX, —Na,:23 This 3Z Xs bis. 22 Ха 
EX, X= dies ХХЫ: э мүн Ses 
EX, X= ar e32 Хауа. з2Х,Х bis: ZX, 


(xxviii) SYMBOLS, ABBREVIATIONS AND FORMULAE 


The. normal equations can be obtained be multiplying the 
regression equation successively by 1, X, and X, and sum on both 
sides. 


Regression of X, on X, and X, X3 25-13 3-03. 5X, + bg. X, 
Regression of X, on X, and X, 


Xs—45. 33-1 aX, Б.Х, 
ZX,— Маза БХ + baa 2X, 
2X X= 03.492 X4 + bs1-22Xy3+ bag. 2X, 
EX Xa= аы Xo + Бый ХХ БХИ. 
The regression equation of X, on X, and X, can also be 
written as below : 


ЕСТЕ 


. Similarly the regression equation of Xy on X, and X, can be 
written as below : 


APPENDIX FIVE 


Statistical Tables 


I. Logarithms 

II. Antilogarithms 

III. Powers, Roots and Reciprocals 

IV. Square Roots of Numbers from 1 to 10 
V. Reciprocal of Numbers 

VI. Binomial Coefficients 


VII. Values of e^ 
VIII. Area under the Normal Curve 


IX. Ordinates of the Standard Normal Curve 
X. Values of t 


XL. Values of X. 
XII. 5% Points of F 
XIII. 1% Point of F 
XIV. Control Chart Constants 
XV. Random Numbers. 
XVI. Conversion of a Person т into corresponding 2 coefficient. 


Note. For undergraduate classes, normally the first five 
tables only are of relevance. 


XXX LOG-ANTILOG TABiES 
I, LOGRITHMS 


+0000 | 0043 0086 0128 1221 25 | 29 34 38 
12 | 16 20 24 | 28 32 36 
12 | 16 19 23 | 27 31 45 


15 19 22 | 26 


EE 


0294 0334 0374 


70414 | 0453 0492 0531 
:0792 | 0828 0864 0899 
1123 1206 1239 


0682 0719 0755 


1038 1072 1106 


1271 1303 
1303 1335 


1367 1399 1430 


14 | 1461 | 1492 1523 1553 | 1584 1614 1644 | 1673 1703 1732 9 

13$ 177761 | 1790 1818 1847 | 1875 1903 1931 | 1959 1987 2014 8| 1114 17 | 20 22 25 

16 | 2041 | 2068 2095 2122 2148 2175 2201 | 2227 2253 2279 8|1013 16| 18 21 23 

17 | 2304 | 2330 2355 2380 | 2405 2430 2455 | 2480 2504 2529 7510 12 15 1 17 20 22 

18 | 52553 | 2577 2601 2625 | 2648 2672 2695 | 2718 2742 2765 трло 72 14| 47 19 22 

19 | 2788 | 2810 2833 2856 | 2878 2900 2923 | 2945 2967 2989 7 | noa | 15 18 20 
*3010 | 3032 3054 3075 | 3096 3118 3139 3160 3181 3201 6 ayy 15 12 19 
132% | 343 3263 3284 | 3304 3324 3345 ! 3365 3385 3404 то 12 | 1g 


73414 | 3444 3464 3483 | 3502 3522 3541 | 3560 3579 3598 
13617 | 3636 3655 3674 | 3692 3711 3729 | 3747 3766.3784 


"3802 | 3820 3838 3856 | 3874 3892 1909 | 3927. 3945 3962 
73979 | 3997 4014 4031 | 4048 406s 4082 | 4099 4116 4133 
73150 | 4165 4183 4200 | 4216 4232 4249 | 4265 4281 4298 


74314 | 4330 4346 4362 | 4378 4393 4409 | 4425 4440 4456 
14472 | 4487 4502 4518 | 4533 4548 4564 | 4579 4594 4609 
"4624 | 4639 4654 4669 | 4683 4698 4713 | 4728 4742 4157 
‘4771 | 4786 4800 4814 | 4829 4843 4857 | 4871 4986 4000 
4914 | 4928 4942 4955 | 4969 4983 4997 | 5011 5024 5038 
15951 | 5065 5079 5092 | 5105 5119 5132 | 5145 5159 5172 
78185 | 5198 5211 5224 | 5237 $250 5263 | 5276 5289 5302 
15315 | 5328 5340 5353 | 5366 5378 5391 | 5403 5416 5428 
5 $453 5465 5478 | 5400 5502 5514 | $527 5539 5551 
15563 | 5575 5587 5599 | 5611 5623 5635 | 5647 5658 5670 
"3682 | 5694 5705 5717 | $329 5740 5752 | 5763 5775 $786 
15798 | 5800 5821 5832 | s$ug 5855 5866 .5877 5888 5899 
13911 | 5922 5933 5944 | 5955 5966 5977 | 5988 5999 6010 
боз! | 6031 6042 6053 6064 6075 6085 | 6096 6107 6117 
+6128 | 6138 6149 6160 6170 6180 6694 | 6201 6212 6222 
16232 | 6243 6253 6263 6274 6284 6294 | 6304 6314 6325 
76335 | 6345 6355 6365 | 6375 6385 6295 | 6405 6415 6425 
6435 | 6444 6454 6464 | 6474 5484 6493 | 6503 6513 6522 
6532 | 6542 6551 6561 657: 6580 6590 | 6599 6609 6618 
6628 | 6637 6646 6656 6665 6075 6684 | 6693 €702 6712 
6721 | 6730 6739 6749 | 6758 6/67 6776 | 6785 6794 6803 
682 | 0821 6830 6839 | 6848 6857 | 6875 6884 6893 


SSA BRE SKE E SEL ULE EES Bees BEE MES X 
É 


M www Yew w wwa bAa bà. anu uuu wan 
Aaa >>>» Rae A anu uun uua © QOO Ou uoo © o 
Aau uuu vuu л AAA DAA Auu y 00 ооо o 

muu UAA OO 0 - uu uw omo ooo 


asa On wuy w mam wmo юю б 


Moe) cc coro o oo 


оо е 0 woo DES 


"6202 | 6911 6920 6928 | 6937 6946 6955 | 6964 6972 6981 


No. log 


No. log 
* = 314159 049715 — Inx = log,x = (1/M) logiox G/M) = 130159 036213 
em 271828 04359 юх = logyox = M log, х М = 043429 1 63778 
2 ' 1 3 4 5 6 7 8 9 то 


boge 04343 08686 13029 (737 21715 26058 30401 34744 39087 4 
loge” 0505; 11916 1697: $2628 33:35 $3942 19599 4:5256 Hb $m 


LOG-ANTILOG TABLES 


56 
5 
58 
59 
é 
é: 
& 
= 
64 
65 
$6 " 
6 
és |- 
ө 
7 
n 
n 
73 
4 
75 
17% 
T 
78 
79 
80 
m 
8з 
83 
84 
85 
56 
87 
88 
89 
9o 
or 
92 
эз 
94 
95 
96 
97 
98 
9 


7007" 7016 
7084 7093 7101 
7168 7177 7185 
7251 7239 7267 
7332 7140 7348 
7412 7419 7427 
7490 7497 7505 
7500 7574 7582 
7642 7649 7657 
716 7723 “7731 
7789 7796 7803 
7860 7868 7875 
7931 7938 7945 
8000 8207 8014 
8069 8075 8033 
8136 8142 8149 
8202 8209 8215 
267 8474 3280 
8331 8338 8344 
8395 -B401 8407 
8457 8453 8470 
8519 8325 8551 
8579 8585 8591 
8639 8645 865% 
8698 8704 8710 
8756 8762 8768 
8814 8820 8825 
8871 8576 8882 
8927 8932 8938 
8982 8987 8993 
9036 9042 9047 
9090 9696 $10: 
9143 9149 9154 
9196 9201 $20$ 


*| 9248 9253 9258 


9299 9304 9309 
9350 9355 9360 
9400 9405 9410 
9450 9455 9460 
9499 9504 9509 


9547 9552 9557 


9595 9600 9605 
9643 9647 9652 
9689 9694 9699 


9736 9741 9745 
9782 9786 9791 
9827 9832 9836 
9872 9877 988! 
9917 9921 9926 
9961 9965 9969 


1. LOGRITHMS 


7024 7033 7042 
7110 208 7126 
7193 7202 720 
7275 1:84 7292 
7356 7364 7372 
7435 7443 7451 
7513 7520 75:8 


7589 7597 1604 | 


7664 7672 7679 
7738 7745 7152 


7810 7818 7825 


7882 7829 7855 
7952 7959 7966 
8021 8028 8035 


8089 8096 8102 
8156 8162 8169 
8222 8228 8235 


8287 8293 8299 
8351 8357 8363 
8414 8420 8426 
9476 8432 8488 
8537 8543 8549 
8597 $503 8609 
8657 8663 8669 
8716 3722 8727 
$774 8779 8785 
8331, 8837 8842 


2387 8893 8899 
8733 5949 8954 
3598 9004 9009 
9053 9358 9063 
9106 112 9117 
9159 9163 9170 
9412 9217 9222 
9253 9269 9274 
9315 9320 9325 
9365 9370 9375 


| 9415 9420 9425 
9465 9469 9474 
9513 9518 9523 
9562 9566 9571 
9609 9614 9619 
9657 9661 9666 
9703 9708 9713 
9750 9754 9759 
9795 9800 9805 
9841 9845 9850 
9886 9890 9894 
9930 9934 9939 
9974 9978 9933 


7950 
7135 
7218 
7300 
7380 
7459 
7536 
7612 
7686 
7760 
1832 
| 7903 
7913 
„Воді 
8:09 
8176 
3241 


0306 
8370 
8432 
£494 
8555 
8615 
8675 
! 


8733 
8791 
8848 
8904 
8960 
901$ 
3066 
9113 
9175 
9227 
9279 
9330 
9380 
9430 
9479 
9528 


9576 


9624 
9671 
9717 


9763 
9809 
9854 


9899 
эһ 


7059 7067 
7143 7152 
7226 7235 
7308 7316 
7388 7396 
7466 7474 
7543 7551 
7619 7627 
7694 7701 
7157 7774 
7839 1846 
7910-7917 
7980 7987 
8048 8055 
8116 8122 
8182 8189 
8248 8254 
8312 8319 
8376 8382 
8439 8445 
$500 8506 
8561 8567 
8621 8627 
8681 8686 
2739 8745 
8797 2802 
8854 8859 
5010 8915 
8963 8971 
$020 9025 
9074 9079 


9128 9:33 | 


9180 9186 
9231 9238 


9184 9189 
9335 9340 
9385 9390 
9435 9440 
9484 9489 
9533 9538 
9581 9586 
9628 9633 
9675 9680 
9722 9717 
9768 9773 
9814 9818 
9859 9863 


9903 9908 
9948 0951 
`999! 9996 
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xxxii LOG-ANTiLOG TABLES 


П. ANTILOGRITHMS 


1002, 1005-1007 | 1009 1012 1014 | tot6 1019 


1026 10:8 тозо | 1033 1035 1038 | 1040 1042 
1050 1052 1054 | 1057 1059 1062 | 1064 1067 
1024 1076 1079 | 1081 1084 1086 | 1089 109! 


1099 1102 1104 | 1107 1109 1112 | 1114 1117 
1125 (027 030 | 1132 1035 0138 | 1140 1143 
1051 9153. 1156 | 1159 1161 1164 | 1167 1169 
1178. 1180. 1183 | 1186 1189 1191 | 1194 1197 
7120$ 1208 1211 | 1213 1216 1219 | 1222 1225 
1233 1236 1239 | 1242 1245 1247 | 1250 1253 
1262. 1265 1268 | 1271 1274 1276 | 1279 +282 
4291 1294 1297 | 1300 1393 1306 | 1309 1312 
1341 1314 1327 | 1330 1334 1337 | 1340 1343 
13521355 1358 | 1361 1365 1368 | 137: 1374 
1384) 1387 1350 | 1393 1396 1400 | 1403 1406 
14161 1419 14227111426 1429 1432 | 1433 1439 
1449 1452 1455 | 1459 1462 1466 | 1469 1472 
1483 1486 1489 | 1493 1496 1500 | 1503 1507 
1512, 1521 15247] 1528 1331 153$ | 1538 1542 
1553 1456 1560 1156231567 1570 | 1574 1578 
1589 1592 1595 | 1600 1603 1607 | 1611 1614 


(a 
A 


mx" 
m 


:3$45352 $82 à 
GGA qua «EA в EAR REL ARE э NEAN VEO Ra eO e aus 


| 1626: 1629 1613 -| 1637 1541 1644 | 1648 1642 
| 1663) 1667 1671/1 1675 1679 1683 | 1687 1690 
| 1702, 1706 1710 | 1714 1718 1713 1726 1730 


1742) 1740 1750 | 1754 1758 1762 | 1766 1770 
1782) 1286 1791 | 1795 1799 1803 | 1807 18:1 
1824; 1828 1832 | 1837 1841 1845 | 1849 1854 


1865 1371 1875 | 1879 1884 1888 | 1892 1897 
1910. 19!4 191911 1913 1928 1952 | 1936 1941 
1954 1959 1963 | 1958 1972 1977 | 1982 1986 


2000 2004 2009 | 2014 2018 2023 | 2028 2032 


1046 2051 2056 | 2061 2065 2070 | 2075 2080 
2094 2069 21104 | 2109 2113 2118 | 2123 2128 
2143 2148 2153| 2158 2163 2168 | 2173 2178 


2193 2198 2203 | 2208 2213 2218 | 2223 2228 
2244 2249 2254 | 2259 2265 2270 | 2275 2280 
2196 2301 2307 | 2312 2317 2323 | 2328 2333 


2350 2355 2360 | 2366 237: 2377 | 2382 2388 
2404 2410 2415 | 2421-2427 2431 | 2438 2443 
2460 2466 2472 | 2477 2483 2489 | 2495 2500 


2518 2523 2529 | 253572541 2547 | 2553 2559 


2576 2582 2588 | 2594 2600 2606 | 2612 2618 

2636 2642 2649 | 2655 2661 2667 | 2673 2679 

2698 2704 2710 | 2716 2723 2729 | 2735 2742 

2761 2767 2773 | 2780 2786 2793 | 2799 2805 

2825 2831 2838 | 2844 2851 2858 | 2864 2871 

189! 2897 2904 | 1911 2917 2924 | 2931 2938 

2958 2965 2072 | 2979 2985 2992 | 2999 3006 

3017 3034 3041.| 3948 3055 3062 

3097 3105 3112 | 3119. 3126 3133 | 314t 3148 3155 
» 3 
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LOG-ANTILOG TABLES xxxiii 
Il. ANTILOGRITHMS 


[zr males] 

195 
o з 3 4 • Iw 8 

3 3 > s «ADD 
3:62 | 3170 3177 3184 f 3192 3499 3206 | 3214 уз11 3228 rally aa] 5 6 6 
32 | 3238 | 3243 3251 3258 | 3266 3173 3281 | 3189 3296 3304 "22/3 4 5) 66 7 
Зэ | 324) | 3349 322 3334 | 3342 3350 3357 | 3965 3373 3381 1à3]34 5] 66 4 
793 | 339A | 3396 1404 36:2 | 3430 3428 3436 | 3443 3431 3459 122]3 4 $| 6 6 7 
154 | 3467 | 2475 348» 5491 | 3499 3508 3516 3514 3532 3540 азу, фет 
755 | 3548 | 3556 35^ 5 3573 | 3581 3589 3507 3606 3614 3622 21| уб 5/6 6 у 
56 | 3678 | 3639 "248 3656 | 3664 1673 3681 3690 3698 3707 казф уз $5616 у 
sil 3748 | 377 3739 3741 | 3756 3758 3767 | 3776 3784 3793 1130445678 
58 | 3802 | 211 3819 3828 3837 3846 5855 | 3864 387) 3882 133]44 5167 8 
*$9 | 389° | 3899 3908 3917 | 3925 3036 3943 3954 196) 1972 123/45 S| 6 78 
бо | 981 | 3990 3999 4099 | 4078 4027 4036 | дод6 4055 494 123145 51 6 2 8 
4074 | 4083 4093 4102 4111 4127 4130 | 4140 4156 4159 131145 61 7 8 9 
4169 | 2479 4188 4198 | 4207 4217 4227 | 2236 4246 4156 123145 61 7 8 9 
4216 4285 4295 | 4305 4315 4325 | 4235 4345 4355 23/3 5 6) 789 
4375 4385 4395 | 4406 4416 4426 | 4436 4446 4457 123145 61 3 8 5 
4477 4487 4298 | 4508 4519 4529 | 4539 4559 4560 12345 6) 789 
4581 4592 460) | 4613 4624 4634 | 4645 4656 4667 "23/4 5 7] 8 910] 
4628 4699 4710 | 4721 4732 4742 | 4753 4764 4775 12314 $ 7] 8 910 
4797 4808 4919 | 4831 4842 4853 | 4864 4875 4887 123/4 € 7| 8 910 
4909 4920 4932 | 4943 4955 4966 | 4977 4889 $000 123146 2| B gto 
$01) 5035 $047 $058 $070 5082 $093 5105 517 124|3 6 7 Bion 
5140 5152 $164 | 5176 $188 5200 | $212 5224 5236 12415 6 7| Вон 
$260 $272 5184 | $297 5309 5321 | 5333 5346 5358 124156 7| 8101 
5383 5395 5408 | 5420 5433 5445 | 5458 5470 5483 12415 6 7] Bion 
$508 5521 5534 | 5546 5559 5572 | 5585 5598 5610 1345 6 8| 9101 
5636 5649 5662 | 5075 5689 5702 5715 572815741 134[5 7 8| $179 à 
5768 5781 5794 | 5808 5821 5834 | 5848 5861 5825 134|[5 7 8| оо 
5992 5916 5929 | 5943 5957 5970 | 5084 5998 бот: 134|6 2 8110110 
6039 6053 6067 | 6o81 6095 6109 | 6124 6138 6152 ?34|]6 7 8110 0 13 
6180 6194 6209 6223 6237 6252 | 6266 6281 6295 134|6 7 8|15 0 13 
6324 6339 6353 | 6368 6383 6397 | 6412 6427 6442 4341/6 2 9| 1012 5 
6471 6486 6501 | 6516 6531 6546 | 6561 6577 659: 23516 8 9| ni i 
6622 6637 6653 | 6568 6683 6699 | 6714 6730 6745 2355/6 P 9| nai 
6776 6702 6808 | 6823 6839 6855 | 6571 6887 6901 235]6 810] at 0 1 
"84 5934 6950 6966 | 698: 6998 7015 | 703: 7047 7063 25516 20 |i 0 n 
86 7096 7112 7129 7145 7161 7178 | 7194 7211 7228 23516 8 тој н 13 f4 
-86 7261 7278 7295 | 7311 7328 7345 | 736: 7379 7396 235|? B10) 13 14 15 
37 7430 7447 7464 | 7482 7499 7516 | 7534 7551 7368 2135]? 910] 12 14 15 
88 7603 7621 7638 | 7656 7674 7691 | 7709 1727 7745 24510 911 [15 14 16 
89 7780 7798 7816 | 7834 7852 7870 7899 7907 7925 24577 911] 13 14 06 
90 7962 7980 7998 | 8017 8035 8034 | 8072 8091 8110 24317 911 | t3 14 (t6 
91 8147 8166 8185 8204 8222 8241 | 8260 8279 8199 146|8101 131517 
33 8337 8356 8375 | 8395 Barg 8433 845) 8472 8492 246|8101 | 13 15 17 
эз 8531 8551 8570 8590 8610 8630 8650 8670 8690 246/8 1012 14 16 18 
54 8730 8750 8770 8790 8810 8831 8851 8872 8892 246 |8 1012 14 16 18 
35 8933 8954 8974 | 8995 9016 9036 9057 9078 9099 246 |8 1013 | 15 17 19 
56 9141 9162 9183 | 9204 9216 9247 | 9268 9290 9311 34681: 13 |15 17 19 
97 9354 9376 9397 | 9419 9441 9462 9484 9506 9528 247 |91113 | 15 18 20 
98 9572 9594 9616 | 9638 9661 9683 | 9705 9727 9750 247/91 13 | 15 18 29 
»» 9795 9817 9840 | 9863 9886 9908 | 9931 9954 9977 257]|9 0 14 | 16 18 21 


э 


77 5М-Арр.—11°77 


xxxvi SQUARE ROOTS OF NUMBERS FROM I TO Io 


SQUARE ROOTS FROM 1 to 10 


MOM MON ON Ош шо шы C) шә 
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ЧӘ С Сә бы GU Qut Go uo d» de 


SQUARE ROOTS OF NUMBERS FROM I TO 10 xxxvii 
IV 
SQUARE ROOTS FROM 11:0 to 


2:349 2:352 [2354 |2356 |2" { 1302 2:364 
2:371 2:373 27375 2:377 |2" 38112383 [2:385 
2-392 12:394 [2:396 12-355 |2 * У 2:406 
2:412 2:415 12:417 |2:4192* ^ 2 2:427 
2:435 |27427 12-437 |27439 [7441924437445 2447 
2454245 Е л i 2 406 2:468 
2474 Eb 2 478 á 7482 bo "486 2:485 
2:494 82 2 j 1506 [2:508 
2:514 2:516 "$202 d 526 2:528 
2:534 27 ДП | М 544 |7540 |2548 
2°5$3 "56 325 55 2:567 


706 
784 
8о2 2:8 


2:884 2:586 je 
2:902 2:903[2" -907 |2:909 2:910 
|9102" 921 f2 jax 2:926 [2-927 |2:929. 
2:936 2:035 127941 2:933]2:944 2:946 2:945 
2-953 2:055 2-956 Г - 961 |2 963 2:905 
2-970 2:97 2 12:97 3 |2" "977 2:978 |2980 2:982 
2:987 (2:988 02:990 |2 р *995 (2:997 (2:998 
3003 |3*005 [37007 |3* 0) Hep 3013/3015 
3:020 |3022 |3023 |7" 3:030 13032 
3:036 3:035 13-040 |3* 3:043 5-045 3:046 13-048 
3:053 [3°05 5 13°056 3 : 3:063 3:064 
37069 3:071 |3072 3074 3:079 [3:081 
3:085 [3087 [3089 13-090 3:095 3097 
37102 3:103 [3:105 |3106 3111|3113 
3:118 3:119]37121 |3122 13:124 13° ids 3:127 |3 129 
51323134 1135 |) 137 3235 3140 |37142 37143 [3148 
[31483150 3151 [3153 37154 3156 [3:158 3 159 |3:161 
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xxxviii RZCIPROCAL? NUMBE S FROM I TO IO 


RECIPROCALS OF NUMBERS. From т то ro 
(Numbers in difference columns to be subtracted, not added.) 


Mean Differences 


|l456|789 


19 24 2933 38 43 
17 21 25 | 29 35 38 
15 18 22] 26 29 33 
13 16 20] 23 26 20 
1215 17] 2023 26 
11 13 16/18 21 24 
1012 14|17 19 21 
15 17 20 
14 16 18 
131416 
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чс 


LIII Hee EDUC 


24 
2.5). 
286 || 
27 |. 
28 
29 
зо 
31 |. 
32 
33 
34 
ЕЕЕ 
36 ||: 
37 
38 
39 
40 * 
41 |: 
42 
43 
44 | 
45. 
48|- 
47 |- 
48 |. 
29 |. 
6-0) - 
51|- 
52 |. 
53 |. 
5:4 


№ NUOOU WWW AR RU Оу OO: o 
(0000) WHARA AARUN Cita OO OS NN оссо 


MM ee RD МЮ Ы Юю UUUWU UO. b dU: Um ONN соо 
AHH ND PON PLD УМЫ ю ю QOUUDO Gh... Utt Ad MM 
SNUNSNN NNNNNM WHUWW QUA A ARUN O сузу оо 


REEVE BOUUOCD косо со PERRO Ut ADD wer со соо 


LOWE P bh hà UUM ADA ooo: 


RECTPROCAIS OF N2MBERS FROM I TO IO XXXIX 
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RECIPROCALS OF NUMBERS. From I TO 10 
{Numbers in difference columns to be subtracted, not added.) 


Mean Differences 


PaA соф еее 
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xl BIN2MIAL COEFFICIENT 


VI. Binomial Coefficients 


n n\/n n n n n n n n n n 

(00 (2 Q (5) (8) С) () (2 E G3) 
51 
n r2 
2| I ff. 
314 6 4 1 
4 
5| I 5 IO то 5 1 
6 1 6 т5 20 15 6 І 
71 7 21 35 35 2I 7 I 
& r 8 28 56 70 56 28 $ І 
91 9 36 84 126. 126 84 36 ° 1 
Ie I-IO 45-120 "'210 252. 210 120 45 Іс 1 
II I I1 55 165 330 462 462 330 165 55 Ii 
12/1 12 66 220 495 792 924 792 495 220 66 
I3| 113 78 286 715 1287 1716 1716 1237 715 286 
I4 I I4 9I 364 тоог 2002 3003 3432 3003 2002 001 
I5, I I5 105. 455 1365 3003- 5005 6435 6435 5005 3002 
16 1 16 120 560 1820 4368 8008 11440 12870 11440 8008 
I7 1 17 136  68o 2350 6188 12376 19448 24310 24310 19448 
18| т 18 153 8:6 3060 8568 18564 31824 43758 48620 43758 
T9 I 19 171 969 3876 11628 27132 50388 75582 92378 02378 
201 I 20 190 1140 4845 15504 38760 77520 125970 167660 1847:6 

IV. Values of еп 
(о< т< r) 
т о Е ЗНАЕ R Она Ви. у 


o'o 1'0000 '9900 “9802 “9704 *9608 | '9512 “9418 9324 "9231 "9139 
o'I 79048 80958 8869 8781 “8694 | 8607 "8521 "8437 '8353 "8270 
0'2 "8187 '8106 ‘8025 17945 778006 7788 "7711 "7634 7558 "7483 
оз 7408 773347261 “7189 "7118 | 7047 16977 6907 "68309 °6771 
0'4 16703 6636 "6570 16505 6440 | "6376 *6313 `6250`6188 6126 
o'5 "6065 "6005 "5945 "5856 5827 "5770 157125655 5599 "5543 
0*6 15488 °5434 °5379 5326 5273 152205169 51175066 '5016 
o7 14966 *4916 4868 “4819 4771 14724 4677 4630 4584 4538 
o8 14493 4449 4404 '4360 4317 | 4274 "4232 4190 4148 *4107 
o'g `4066 40253985 "3946 3906 13867 3829 "3791 "3753 "3716 


(m=1, 2,3 ..10) 
lea canes se TENE 


ud x 2 3 4 5 6 7 8 9 то 


],-m '36788 *13534 704979 01832 7006738 002479 709991 '000335 '000123 "000045 


Мое: To obtain values of e 
‘of exponents. 


‚ Example. 6725. (e200) (e~ 


-m for other values of m, use the laws 


9585) = (13534) (7047)—*095374 


AREA UNDER THE NORMAL CURVE xli 


VII. AREA UNDER NORMAL CURVE 


The entries in the table are Area under Normal Curve 
the probabilities that а random 

variable having the standard 

normal distribution assumes a 

value between o and z; they 

are given by the area under the 

curve shaded in the figure 

shown on the right hand side. 


The Standard Normal Distribution. 


z "оо "ог "o2 "o3 '04 ‘os 06 Ө 708 9 


0279 '0319 '0359 
'0675 "0714 ‘0753 
'1064 °1103 “£141 
'1443 1480. *1517 
"185^ "1844 *1879 
'2137 "2190 2224 


Oodd uaxdNu4G 


` 486 '2517 "2549 
2794 `2823 `*2852 
3078 "3106 *3133 
73340 73365 '3380 
'8577 13599 "3621 


DT .:3643 73665 73686 -3708 3720] "3749. '3770 "3790 13810 73839 
U2  '3849 "3869 :3883 5907 "3925 | 3944 "3962 "3080 13997 "4015 
V3 :74032 74049 '40: 74082 74299|'41I5 "4131 74147 4162 4177 
U4 "4192 '4207 '42:2 4236 *4251| 4265 4279 "4292 74306 14319 
US — 4332 74345 “4517 '4370 74382|'4394 '4406 4418 '4429 +4441 


V6 4452 "4463 '4474 4484 "4495 | 4505 .'4515 "4525 4535 '4545 
r7 4554 HH 4573 74582 '4:91| 4599 "4608 "4616 74625 4633 
Д 74656 74664 74671| 4678 4686 4693 74699 "4706 
P9 14713 4719 '4726 74732 4738 | 4744 "4750 “4756 "4761 4767 
TO 74772 4778 4783 74788 '4793|'4798 "4803 “4808 "4812 '4817 


2л 74821 74826 74830 74834 -4838|'4842 74846 "4850 '4854 "4857 
22 4861 “4864 -4808 4871 4875 | 4878 '4881 4884 :4887 4890 
23 74803 74896 74898 4901 '4904|'4906 '4909 *491I '4913 4916 
24 14918 '4920 74922 492; '4927| 4929 '4931 "4932 74934 '4936 
25 74938 74940 4941 74943 '4945| 494 -4948 "4949 “4951 14952 


26 4953 '4955 74956 "4957 "49591 4960 4961 "4062 "4963 -4964 
277 14965 74966 74967 74908 '4969|'4970 '4971 '4972 "4073 "4974 
28 74974 74975 74976 “4977 4977 | 4978 4979 '4979 '4980 "4981 
29 74981 :4982 :4982: "4083 "4984 °4984 '4985 '4985 4986 "4986 
Fo 74987 74987 :4987 74988 "4988 | "4989 "4985 '4989 '4990 *4990 


ORDINATE (Y) OF THE STANDARD NORMAL СОҢУ: AT Z 


IX. ORDINATES (Y) 
f th 


of the 
STANDARD 
NORMAL CURVE 


atz 


ооо боп ооз ооз  o'o4 ооз 0'06 . 0'07  o'o8  o'og 


vee woe 


ESI z 
sos Qu o us 


(0933 0032 "0031 "0030 "0029 | ‘0028 `оо2 "0026 'oo25 “002! 
10024 10023 "0022 ‘0022 0021 | ‘0020 бою 90 Go: 2018 
70017 10017. "0016 "0016 “oors |'oors "OOI4 "0014. “0013 '0013 
0012 *0012 `ооїз ‘corr `оотт | `00ІО “ooro “OOIO -"0009 “0009 
'0009 "0008 0008 ‘0008 “ооой "0007 '0007 *0007 “0007 ‘0006 
^ 5 p D 7 900 
d бес “0006 10005 70005 10005 “0005 0005 “o005 '0004 
"003 "oco 50004 70004 "0004 | "0004 'o003 '0003 "0003 '0003 

.9003 "0003 '0003 0003 |'0002 '0002 ‘0002 '0002 '0002 
0002 “9002 ‘0002 '0002 0002 70002 "0002 ‘0002 'Oo0I ‘OOOI 


Ixiii 


VALUES OF t 
4 VII. Values oft 
NP 
| 0'20 o'10 0°05 0°02 o'or d.f 
у) 
1 3078 6'314 127700 31'821 63:657 1 
2 1 886 2'920 4303 6'965 9'925 2 
3 19638 2353 3182 4541 5'841 3 
4 | C533 2132 2776 - 3747 . 4604 4 
55 1101476505 2'018. 52:501 244650572, ЫЕ 
| 
6 1'440 1943 2447 3143 3707 $ 
7 415 1'895 2365 2998 3'499 7 
8 1397 1'860 2'306 2'896 3355 8 
9 1383 1'833 2'262 2:821 3'250 9 
to | 1372 1'812 2'228 27764 3°169 10 
її 1°363 17796 2'201 2'718 3'106 II 
12 1.356 177282 2'179 2'681 3'055 12 
13 17350 1'771 2'160 27650  .3'012 LAS C 
14 | 1345 1761 2145 2624 2977] 14 
15 1/341 1753 2'131 2'602 2'947 [org 
| 
16 1337 1'746 2'120 2'583 2'921 | 16 
17. |. 1333 1'740 2`110 2'567 2'898 17 
18. | 330 1'734 2 101 2'552 2:878 18 
19 17328 1'729 27093 2'539 2:861 19 
20 | 1:325 1'725 2'086 2'528 2'845 | ao 
21 1'323 1'721 2'080 2'518 2'831 21 
22 £°321 17717 2'074 2'508 2'819 22 
23 17319 1'714 2'069 2°500 2:807 . 23 
24 | r318 гт 2°064 2°492 2°797 24 
25 " 1°316 17708 2'060 2'485 27787 | sas 
26 U315 1706 2o56 2479 2779 | a6 
27 1°314 1'703 2.052 2473 2°771 27 
28 I'313 I'701 2'048 2:467 2°763 28 
29 зїї 17699 2'045 27462 2'756 ag 
Inf. | 1282 1'645 1'960 2'326 2'576 | Inf. 


xliv SIGNIFICANCE POINTS OF X? 
XI. SIGNIFICANCE POINTS OF x? 
Jim | Р “95 $o cro 05 Oal or 
5 x 
1 "000157 700393 455 27700 3841  s'412 6635 
a *o2cr 103 1:386 4'605 5991 7824 9210 
3 "IIS "352 27366 6251 7815 9'837 11°341 
4 "297 “711 3°357 7779 9488 11668 13277 
5 "554 1'145 4'351 9236 11'070 13:388 15'086 
6 "872 1'635 5/348 IO45 12:592 15.033  16'81a 
7 1239 2167 6:346 127017 14'067 16:622  18'475 
8 1:646 27733 7344 13:362 15'507  18'168 20'090 
9 2:088 3325 8:343 14'684 16'919 19:679  21'666 
10 2'558 3'940 9'342 15'987  18'307  2r'161 23209 
п 3053 4°575 10°341 17275 19:675 22618 24725 
12 3571 5'226 11'340 18'549 21'026. 24'054 26'217 
13 4'107 5:892 127340 19:812 22362 25'472 27:688 
14 4'660 6'571 13:339 21'064. 23:685 26:873  29'141 
15 4229 7261 14339 227307 24996 28259 30578 
16 5'812 7'962 15:338 23:542 26'296 29:633  32'000 
17 6:408 8:672 16338 24769 27:87 30995 33409 
18 7015 * 9'390 17338 25:989 28'869  32'346 34'805 
19 7633 IO'I17 18'338 27'204 30'144 -33'687  36'191 
20 8'260 10'851 19°337 28'412 31'410  35'020 37566 
ar 8'897 117591 20'337 29'615 32671  36'343 38932 
22 9`542 12:338 21:337 30:813 33:924 37:659 40"289 
23 10196 13'091 22'337 327007 35'172 38968 — 41:658 
24 107856 13/848 23:337 32'196  36'415  40'270 42'980 
25 117524 14611 24°337 34'382 37652 41566 — 44314 
46 12'198 15379 25'336 35363 38'885  4r856 45642 
27 12:879 16151 26'336 36741. 4o'113 44140  46'963 
28 13'565 16:928 27:336 37916 41'337 45419 48278 
29 14256 17708 — 28:336 39'087 42'557 46-603 49:58 
зо 14'953 18°493 20336 40256 43773 47962, sc'892 


_ Note. For degrees of freedom (У) greater than 30, the quantity 
v 4X— у av—1 may be used as a normal variate with unit variance 


xlv 


24 
Пп 

0 
45 
64 


6 230 


$ per cent points 
25 19 
12 


XII. SIGNIFICANCE POINTS OF THE VARIANCE-RATIO F 


SIGNIFICANCE POINT} OF THE VARIABLE-RATIO Е 


xlvi 1% POINTS OP Р 


XIH. 196 Points of F 


хө | | 
| 6 ` " ~ 
ее 2 5...4 | 5 | l3 4 | 
г (4052 140995 |5403 |5625 |5764 |5859  |s98a |бїобб  |623; (6360 
3| 9850) 9900| 99°17) 99°25] 99°30] 99°33) 09'37| 99'42| 99'46| 9o'sc 
3 3412| 30°82) 29'46| 28°71] 28'24| 27°91) 27:49 27'05| 26'60| 26°13 
4 2120| 18'00| 16'69| 15'98| 15'52| 15°21 1480| 14'37| 13'93 13°45 
5 16°26 | 1327| 12°06) 11°39] 10°97 10°67} 10°29] 980 9°47] 0:02 
6 13°75 | 10°92 978 өх] 8°75} 875 810] 772 73] 6°88 
7 1225 9'55 8°45] 78 746) 719 684 647 607 $6: 
8 11'26 8°65 759] 70] £663 637| боз 567 528] 486 
9 10'56 8:02 69) 6": боб $80 54] sir 473| 4°31 
19 | 1004) 756) 655 s'os) .5'64 5:39 566 471 433 3'91 
nj. 965| 72x} 622 5:8] 5:32 бо) 474| 4:40] .402 3'60 
u 9°33 693 $95| 540) 500  4'82 4501 416 378 336 
13) 907) 670| $74 52] 486) 46| 43) 396 3 59] 317 
14 886| 651) s56| бол 469 446 у 3'80| 343] тоо 
15 8°68} 636| s42|. 4°80] 4°56) жз 4°00 367| 329 287] 
16 8:53| 623) 520 47; 444 420) 3-80 35 318| 275; 
17 8'40 бї 518} p6 434 410 379] 346 3'08) 26 
18 8:20 g'or 5°09}. 4°58 425| Жо 5v: 3:37] 3'60| 2'59 
19 818] 593) уо sc 47 94 76у 33 3°92) 2°49 
20 B10] s'85| 494 441] 10 387 356 3'23| 286| 24: 
ar} Broz} 5-78) 487 437) жоу 38| 351] 317] 28) 2 36 
22 T9$5| 572| 48 431 399  376| 345 312 27] c3 
23 7°88) 566| 476. 426| 394 371 x41 307 2°70] 226 
24 7*82|. 5'61 472 422 390 536] 3360 303 2-66] 22 
as 277| 5%7) 468 +18) '"385| 363 332 x99 2623 2 17 
26 772| 5:53| 464 414 3821 359] 329 290" > 58] * 2'10 
27 708) 5490) 460 411] 378 3:56] 326] 293 24 2°13 
28 T'64| 545 457, +07 375| 353| 323| 290] 22 206 
29 760| S42| 454 404 373 350] 320 287 240 2 o3 
30 956| 539| язу о 370] 34) узур 2784 2°47] zor 
4c Tat} 518) 431 323] 351 320] 299| 266 2°29} 180 
6o 708| 498| 413 365] 334 xr2| 282| aso! 2-2]  r6o 
120 685| 479 3795) 3°48] 317 2'96) 2'66) 234 I'gs| 158 
о 664] 4°60) 375 x3] 302 2'8о) 251 218 1 97] roc 
x 
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XIV | Control Chart Constants 


w 

E Chart for _ Chart for Standard Chart for ranger 
= averages deviations Ё 
E ITa 

ов kg *B8| 
5 E| Factorsfor | $5.8 55) 
‚2 | control B Factors for ат Factors fo: 
os limits 88 control limits 8 8 control limits 
; 38 og 
2 EB = 5| 

| 
п A|A;[A.|. C, |B, (В, | B, (В, | di Ф, | Ds О | Di 


2121/3'76c|1'880/0' 5642 11-843 13"26511"128|0 — |36805 |2 267 


2 c o 

3 |1°732)2°394|1°023/2°7236 |o r858/ |2'568|v693|o 4'358/c 2575 
4 |r'5oo|1'88c|o'729/0'7976 |o 1808/0 2`266|2'059|0 4 "bo 2 282 
5 ]|1'342/1'596/0'577|78407 |o 1756/0 2°089|2°326/0 4'9I8|c 2115 
6 [1'22:11°41с/0"483/0"8686 |o*o2'|r713]o03c|107c|2/52«.|o.— [570780 |2'с04 
7 .|171324|1727;|0°4100°8882 | '10z|1672/o'11*|1882|2770«|0'205|5720; 0076/1024 
8 |1°061}1°175|0°373/0°9027 |o7167|1638/o 18r |1/81«|2/84: |»*387.|5307]07136/1:864 
9 |г'0001'09419°33710"9135 |o'216|1*606/o'236|17612'97c 054€ |5394/0 18. 1'816 
IO |o'949|1'028/0*308/5'9227 |o*262|1'584|0'284|1'716|3*678]0768- 57460022 5 ih 1777 


' її (о'9о: [0'973|0'2850'9300 |o'290|1 564 0'256/0744 
12 0:8660'92: |o*266/0*9356 |o'333|1' 541 р Я 2'284/1'716 
13 10'832|0'884|0'246/0'941€ |0'35:./1'523/0'38- | 618137336 02! 5° 546 |o°308|1'692 
14 |0'802/0:848|0'23:/0'945: |0'384|1'507|0'406|1594|3 40" |1"121|5"693/0°329|1'671 
15 (|o0'77:|9 816/0'2235/0'940« | 0'4c' |1"'49210°42х|1"57213'472|1'20: 15"737103481'6=2 — 


Stafidards Givea Analysis’ of Past Date 
Statistic 

Central line Limits Central Line Limits 

3 | Шш zi d Xt Ave 

x X x’ į x’ ЖА с X n oF 

X+A,R 
PEINE E "E 
s Cys’ | Byo,’, Во” в Вэс, Вос 
R dye’ | Dio, Das" R | DR,DR 
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RANDUM NUMBERS 


XI. Random Numbers 


51772 74640 42331 29044 46621| 62898 93582 
24033 23491 83587 06568 21960| 21387 76105 
45939 60173 52078 25424 11645| 55870 56974 
30586 02133 75797 45406 31041| 86707 12973 
03585 79353 81938 82322 96799| 85659 36081 


64937 03355 95863 20790 65304| 55189 00745 
15630 64759 51135 98527 62586| 41989 25439 
09448 56301 57683 30277 94623, 85418 68829 
21631 91157 77331 60710 52290| 16835 48653 
91097 17480 29414 06829 87843| 28195 27279 


50532 25496 95652 42457 73547, 76552 5c020 


07136 40876 79971 54195 25708| 51817 36732 


27989 64728 10744 08396 56242| 90985 28868 
85184 73949 36601 46253 00477) 25234 09908 
54398 21154 97810 36764 32869| 11785 55261 


65544 34371 09591 07839 58392, 92843 72828 
08263 65952 85762 64236 39238| 18776 84303 
39817 67906 48236 16057 81812| r5815 63700 
62257 04077 79443 95203 02479| 30763 92486 
53298 90276 62545 21944 16530! 03878 07516 


04186 
10863 
37428 
17169 
50884 


65253 
88036 
06652 
71590 
47152 


24819 
72484 
99431 
36574 
59009 


91341 
99247 
85915 
54083 
95715 


19640 
97453 
93507 
88116 
14070 


11822 
24034 
41982 
16159 
35683 


52984 
94923 
50995 
72139 
38714 


84821 
46149 
19219 
23631 


87056 
90581 
94277 
42181 
74950 


15804 
67283 
49159 
14676 
47280 


76168 
75936 
20507 
70185 
38723 


63886 
03229 
45943 
05825 


02526! 33537 


XVI. Conversion of a Pearson т into a corresponding Fisher's 
s 2 coefficient* 


|0920 1°59 


|'925 «1°62 
|7930 1'66 


'945 1°78 


Я КЛ under 25 may be taken as equivalent to z's, 
oos CAO ES 
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